Data For Science is active in the implementation and enablement layer of the AI agent stack. They specialize in designing agentic workflows that incorporate specific design patterns for tool use and multi-step reasoning. Unlike general AI consulting, their work focuses on the technical guardrails and failure-mode handling necessary to move autonomous agents into production environments where reliability is non-negotiable.
They matter to the ecosystem because they act as a validation and deployment partner for teams using various agent frameworks. By focusing on evaluation and observability, they help establish the metrics by which agent performance is judged in enterprise settings. Their work in production hardening specifically addresses the inherent unpredictability of agentic systems, providing a bridge between experimental research and reliable corporate software.
Bruno Gonçalves founded Data For Science in 2019, positioning the firm at the intersection of academic research and production-grade software engineering. Before starting the consultancy, Gonçalves spent time as a Data Science fellow at NYU’s Center for Data Science and held a tenured faculty position at Aix-Marseille Université. His background in the physics of complex systems informs the company's approach to Large Language Models (LLMs), which treats AI systems as complex entities requiring rigorous measurement rather than simple prompt engineering.
The firm focuses on the "production gap"—the space between a successful Jupyter notebook demo and a reliable customer-facing application. Many teams can build a prototype using basic Retrieval-Augmented Generation (RAG) or agentic frameworks, but few can ensure those systems remain stable under load or within specific cost and latency constraints. Data For Science addresses this through a tiered consulting model designed to insert technical expertise at specific stages of the development lifecycle.
Their primary offerings include a two-week Strategy Sprint for use-case selection and ROI framing, followed by four-to-six-week MVP builds. During these builds, the firm provides more than just code; they deliver what they call an "evaluation harness." This reflects a broader trend in the agent ecosystem where the bottleneck is no longer generating a response, but proving that the response is accurate and safe. For companies further along in their AI journey, the firm offers "Production Hardening," which focuses on monitoring, regression tests, and guardrail implementation.
Gonçalves himself is a visible figure in the technical education space, serving as a live training instructor for O’Reilly. This instructional DNA is baked into the consultancy’s "Fractional AI Lead" service, where they provide technical advisory and team upskilling. This model is common among boutique AI firms that recognize the talent shortage; they aren't just building the tools but are training the internal teams to maintain them.
Competitively, Data For Science occupies a space distinct from large-scale management consultancies that offer high-level AI strategy. Instead, they compete with other specialized technical agencies and fractional CTO services. Their differentiator is the depth of their technical focus. By emphasizing failure-mode handling and tool-use design patterns for agents, they address the specific engineering challenges of the current LLM era. The company’s output is typically a reference implementation and a handoff plan, aimed at organizations that want to own their AI stack rather than outsource it indefinitely.
The company is based in the New York area but operates globally, reflecting the remote-first nature of modern AI engineering. Their work with LLMs, agents, and observability tools positions them as a bridge between the research-heavy labs and the pragmatism of enterprise software. Through their Substack and various training programs, they contribute to the professionalization of LLM engineering, moving the field away from subjective evaluation toward more empirical methods.
A two-week engagement for use-case selection and ROI framing.
A 4-6 week engagement to build a working prototype with a reference implementation.
Data For Science is hiring.