Balazs NemethiJul 3, 2026, 2:38 AM
Hi all,
Kickoff for the Evals work stream. Everyone evaluates agents, and almost nobody does it the same way. That gap is the point of this list.
Two threads worth starting: How do you evaluate your agents today, why that method, and what does it miss? Which eval tooling do you actually run in CI?
You will see more info at: https://agentcommunity.org/lists/evals
Please read the netiquette for posting on the thread: https://agentcommunity.org/lists/netiquette