Moss provides a real-time semantic search runtime built in Rust and WebAssembly designed for conversational AI and voice agents. By distributing compact indexes directly to the browser, mobile devices, or edge environments, the company enables agents to perform retrieval-augmented generation (RAG) with sub-10ms latency. This approach replaces the traditional model of querying remote vector databases, which often introduces significant network delays that can disrupt the flow of voice-driven or real-time copilot applications.
Within the agent ecosystem, Moss functions as a specialized retrieval layer optimized for client-side execution. For developers building agents that require immediate response times or offline functionality, the platform removes the infrastructure burden of managing centralized search servers. By shifting retrieval to the local runtime, Moss supports a decentralized data model that emphasizes execution speed and data privacy, allowing sensitive information to be indexed and searched without leaving the host device.
Moss is architecting a foundational, real-time semantic search runtime explicitly engineered for the next generation of AI agents, voice interfaces, copilots, and multimodal applications. The strategic long-term objective is to decentralize the retrieval layer, shifting it away from monolithic cloud environments and distributing it directly to the edge where intelligence operates. By bridging the gap between vast remote knowledge bases and instantaneous local execution, Moss aims to become the ubiquitous retrieval standard for AI-native applications running in browsers, on mobile devices, and across serverless environments.
The "secret sauce" of Moss lies in its deeply optimized Rust and WebAssembly architecture, which effectively eliminates the latency-heavy network hops inherent to traditional remote vector databases. AI agents routinely perform dozens of lookups per task; at 100–500ms per remote database call, this accumulates into significant friction—a fatal flaw for real-time conversational voice agents. Moss resolves this by indexing, syncing, and pushing a highly compact index down to the local runtime environment. This guarantees sub-10ms lookups, ensuring a fluid experience for voice AI while simultaneously unlocking a "privacy-by-architecture" model where sensitive user data remains on the host device.
Developers connect their source data—including documentation, knowledge bases, and live feeds—via the Moss platform or SDK. Moss seamlessly manages the complexities of indexing, packaging, and distributing the compact index across environments. Utilizing drop-in TypeScript or Python SDKs, the search runtime is embedded natively into the agent's application layer. This enables a zero-infrastructure, zero-network-hop retrieval system that remains reliable offline and syncs intelligently upon reconnection.
Founded in 2024 and headquartered in San Francisco, Moss is a Y Combinator-backed (YC F25) venture, supported by notable investors such as the Pioneer Fund. The company is led by founder Sri Raghu Malireddi, whose expertise includes pivotal work as a Machine Learning Engineer at Grammarly. During his tenure, he spearheaded systems for on-device personalization on the iOS Keyboard—experience that translates directly to the mission of high-efficiency, on-device AI.
Moss operates as a Category Creator in Edge-Native Semantic Retrieval. In stark contrast to established, heavyweight cloud vector databases, Moss acts as a disruptor by rejecting the standard remote-RAG (Retrieval-Augmented Generation) architecture. Instead of centralizing data in the cloud, it localizes retrieval. It bridges the gap between scalable vector search and ultra-low-latency edge computing, carving out a specialized niche that prioritizes speed, privacy, and autonomy over sheer scale.
A search runtime for voice agents, copilots, and multimodal apps.
Moss is hiring.