Want to connect with Hub.xyz?
Join organizations building the agentic web. Get introductions, share updates, and shape the future of .agent.
Is this your company?
Claim this profile to update your info, add products, and connect with the community.
Hub.xyz is relevant to the AI agent ecosystem because it provides the specialized, real-world training data required for agents to interact with physical environments and niche human contexts. While LLMs are trained on digital text, agents often require 'ground truth' data—such as specific accented speech for voice assistants or multi-angle video of physical tasks—to operate effectively in the real world.
The company sits at the 'data acquisition' layer of the agent stack. By offering a single API for real-world collection, it enables agent developers to source training and evaluation data for edge cases that are absent from public datasets. This is particularly important for 'agentic' workflows that involve multimodal understanding, red-teaming, or local language adaptation, where the performance ceiling is often dictated by the quality of the available training data.
The consensus among frontier AI labs is that the supply of high-quality text and image data available on the open web is nearing exhaustion. To continue the scaling laws that have defined the last three years of progress, model developers are looking for 'frontier data'—information that exists in the physical world but has not yet been digitized or structured. Hub.xyz is building the infrastructure to bridge this gap by treating the entire world as a distributed data pipeline.
Founded by Tim Sprecher and based in Palo Alto, Hub.xyz operates a network of over 500,000 contributors across 100 countries. These are not merely digital workers labeling existing datasets; they are active collectors who record audio, capture video of specific scenes, and provide rare language samples that do not exist in common repositories like Common Crawl. By using this global footprint, the company can deliver multimodal streams—audio, image, and video—that are tailored to the specific training needs of its clients, which include major AI labs and enterprise technology firms.
One of the more technical aspects of the Hub.xyz model is its use of 'programmable bandwidth.' The network utilizes the idle internet connections of everyday users to help process and deliver these data streams. This approach mirrors other decentralized physical infrastructure (DePIN) projects, but with a narrow focus on AI intelligence. A recent partnership with SwissBorg highlights this strategy, allowing a large community of users to become active node participants. This creates a recursive loop: users provide the bandwidth and local presence necessary for data collection, and Hub.xyz handles the orchestration and quality assurance.
The company is a Y Combinator-backed startup (YC P26) and recently raised approximately $1.7 million to scale its operations. Its core offering is an API that promises to move from request to data delivery in under two minutes. This speed is a clear attempt to replace the high-friction, high-latency process of hiring traditional data collection firms or managing disparate crowdsourcing platforms.
Because Hub.xyz deals with human-generated data and real-world recordings, the company emphasizes a 'rights-cleared' and ethically sourced model. This is a response to the growing legal and regulatory scrutiny regarding how training data is acquired. Every dataset is licensed from consenting contributors, with a focus on GDPR and CCPA compliance. This focus on provenance is a key differentiator for enterprise customers who need to ensure their models are built on legally sound foundations.
By combining human-in-the-loop validation with automated QA, Hub.xyz attempts to solve the consistency problem that plagues large-scale data collection. The result is an infrastructure layer that allows developers to treat real-world data gathering as a software problem rather than a logistical one. As the industry moves toward specialized and multimodal agents, the demand for this specific, 'long-tail' data is likely to be the bottleneck that Hub.xyz is positioned to solve.
A distributed network turning idle bandwidth into real-time training data for AI systems.
A standalone web map tile server for Blue Marble basemap as raster tiles
A CLI tool for working with geospatial mapping data in DuckDB databases.
Generator to create fontcatalogs to be used by harp.gl
compose file for docker and podman to run a local version of XYZ maps hub, build container images and deployment orchestration
Imposm imports OpenStreetMap data into PostGIS
Tegola is a Mapbox Vector Tile server written in Go
Showcase of web maps made with HERE XYZ
Manage your XYZ Hub or HERE Data Hub spaces from Python.
Font catalog to use with harp.gl renderer
3D web map rendering engine written in TypeScript using three.js
Hub.xyz is hiring
You've explored Hub.xyz.
Join organizations building the agentic web.