Want to connect with Retab?
Join organizations building the agentic web. Get introductions, share updates, and shape the future of .agent.
Is this your company?
Claim this profile to update your info, add products, and connect with the community.
Retab provides the perception layer for documents within the AI agent ecosystem. AI agents are often limited by their inability to reliably act upon data trapped in non-textual or poorly formatted documents. By providing a high-accuracy, schema-first extraction layer, Retab allows agent developers to feed structured, validated data into their agentic workflows. This reduces the risk of agents making decisions based on misidentified fields or hallucinations.
The company positions itself as 'intelligent middleware' for AI agents. Its focus on automating the extraction and classification of documents enables agents to perform end-to-end tasks like automated accounting, legal review, and customer support routing without manual data entry. For builders in the agent stack, Retab is a tool for converting unstructured document inputs into the structured outputs necessary for autonomous decision-making.
Retab is a developer platform that focuses on the conversion of unstructured documents into structured data. While document processing is an old problem, the approach has shifted significantly with the rise of foundation models. Traditional Intelligent Document Processing (IDP) relied on brittle templates or narrow OCR models that struggled with layout variations. Retab uses large language models to handle the complexity of documents that typically break older systems. The company provides a set of SDKs and a REST API that allow developers to define a schema and receive validated JSON outputs from PDFs, images, or spreadsheets.
The core of the platform is built around a visual-first, developer-first philosophy. For users who prefer a graphical interface, Retab offers a workflow builder to connect ingestion sources like Gmail, Outlook, or Zendesk to extraction nodes. For engineers, the Python SDK simplifies the process to a few lines of code. Unlike DIY LLM implementations that often struggle with layout preservation or hallucinated fields, Retab uses a technique called k-LLM consensus. This method runs multiple models to confirm data accuracy and provides per-field likelihood scores. This is a critical feature for enterprise users who need to quantify the uncertainty of their automated processes before they hit production systems.
The company was founded in 2023 by Louis de Benoist, Sacha Ichbiah, and Victor Plaisance. The founding team brings an engineering background from the University of Cambridge and École Polytechnique. Retab is headquartered in San Francisco and recently raised $3.5 million in pre-seed funding, backed by investors including Kima Ventures and Dataiku CEO Florian Douetteau. This capital is being used to scale a platform that already claims to have processed over 500 million documents for a range of clients from startups to large organizations like Maersk and Harvard University.
Retab differentiates itself from basic LLM wrappers by focusing on the operational requirements of document processing. This includes automated dataset labeling, model evaluations, and source highlighting, which allows users to trace every extracted field back to its specific location in the original document. By providing different model tiers, the platform allows developers to balance latency and cost requirements. This is particularly relevant for high-volume operations like customer support ticket routing where sub-second response times are necessary.
The competitive environment for Retab includes traditional players and newer, LLM-native competitors. However, Retab's positioning as an intelligent middleware suggests a broader ambition. They are building the infrastructure that allows autonomous systems to interact with the physical world's paperwork. As AI agents move from simple chat interfaces to performing complex operations, their ability to accurately interpret a multi-page contract or a messy spreadsheet becomes the primary bottleneck. Retab aims to be the standard interface that solves this bottleneck, providing the structured ground truth that agents require to make decisions.
AI-powered document automation for extracting structured data from PDFs, images, and spreadsheets.
The official companion repo for Retab's Next.js Quickstart using the app router
A command-line interface to bring to power of retab right into your terminal
k-LLMs consensus for OpenAI client. Built with 🩷 by the retab team.
A curated list of resources related to structured generation 🔥
The developper starter pack for document processing
The official docs for retab
Retab is hiring
You've explored Retab.
Join organizations building the agentic web.