JigsawStack is a utility provider for the AI agent ecosystem, specifically focusing on the 'perception' and 'data acquisition' layers of the agent stack. Agents require reliable ways to interact with the web and extract structured information from unstructured sources. JigsawStack’s AI Web Scraper and vOCR models provide these agents with the tools to see and read digital environments without the overhead of calling larger, more expensive models like GPT-4 for every sub-task.
By offering a token-based pricing model and specialized small models, JigsawStack makes it economically viable to build agents that perform high-volume data scraping or document processing. They are active in providing the foundational APIs that agent developers use to feed clean, structured data into their decision-making loops, essentially acting as the 'eyes and ears' for agentic workflows.
JigsawStack is a Singapore-based startup that provides specialized small AI models designed to handle the repetitive, data-heavy tasks that general-purpose LLMs often perform inefficiently. While much of the industry focuses on the race toward massive, trillion-parameter models, JigsawStack takes the opposite approach. They build and host small models optimized for specific primitives like web scraping, optical character recognition (OCR), and speech-to-text translation.
The company is led by Yoeven D Khemlani, a founder with a background in the Singaporean technology sector. Founded in 2024, the team has quickly moved through several iterations of its product, culminating in the release of its "v3" platform. This update introduced a granular, token-based pricing model that moved away from flat API invocation fees. At $1.40 per million tokens, JigsawStack positions its services as a significantly more cost-effective alternative to using frontier models for routine data extraction.
The core thesis of JigsawStack is that developers do not need a massive model to extract a price from an HTML page or to detect an object in a photo. In many cases, using a giant model for these tasks is not just overkill but introduces unnecessary latency and cost. JigsawStack provides a suite of APIs that cover three main categories: data extraction, transformation, and validation.
Their AI Web Scraper is a central part of their offering. It allows developers to provide a URL and "element prompts" to return structured JSON data. Unlike traditional scrapers that rely on fragile CSS selectors or XPath, JigsawStack uses AI to understand the intent of the request, making it resilient to website layout changes. Their vOCR and object detection models follow a similar pattern, providing high-speed analysis of visual data that integrates directly into a software stack without the need for complex infrastructure management.
JigsawStack positions its offerings as infrastructure automation. The goal is to let engineers focus on the business logic of their applications while the plumbing of AI—hosting, scaling, and optimizing the models—is handled by the platform. This is a deliberate contrast to companies that offer broad, chat-based interfaces. JigsawStack is built for the terminal and the code editor, focusing on developer experience (DX) and reliability.
The company’s growth is tied to the increasing demand for data pipelines that can process unstructured information at scale. As developers build applications that need to browse the web, read documents, and process audio, they require reliable and fast inputs. By focusing on these specific primitives, JigsawStack is a foundational layer in the stack. They are currently active in the San Francisco and Singapore markets, having raised early-stage funding from investors like Ada Ventures. Their roadmap indicates a continued focus on expanding the library of specialized models, with recent additions including image-to-video and text-to-image capabilities, though their strength remains in the structured data and extraction side of the AI ecosystem.
An API to extract structured data from any URL using element prompts.
JigsawStack is hiring