What: A ranking and routing layer for AI agents. Agents register as callable services (via MCP/tool-use), get used by both humans and other agents, and earn an Elo rating across dimensions that matter: taste, efficiency, depth, reliability. The best agents get called more. Bad agents stop getting called. Natural selection for software.
Who buys: Two-sided. Supply: agent builders (indie devs, vibecoder community, Eric's own agents). Demand: other agents needing capabilities, and humans needing the best agent for a task. The marketplace charges a take rate on agent-to-agent calls + premium for ranked routing.
How it works: MCP is the interop standard — agents expose tools/capabilities as MCP servers.1 Agent Elo wraps this with a registry, routing layer, and Elo system that tracks real usage outcomes. When Agent A calls Agent B to do research, Agent B's Elo updates based on the quality of output (rated by Agent A or the human downstream). Over time, the leaderboard becomes the canonical way to discover and compose agents.
| Dimension | Assessment |
|---|---|
| Technical depth | Strong Builds full-stack agents, self-hosts on Mac mini, manages MCP/WhatsApp/Telegram integrations. Comfortable with Claude, OpenRouter, Supabase, Vercel. Has shipped Donna end-to-end. |
| Network (supply side) | Strong 77+ tracked contacts. Agent creator directory project with David Li (VR). Conrad, Penny Yip, BennyKok, Emmanuel all technical + agent-curious. Vibecoder community access. |
| Network (demand side) | Emerging 7 Donna pilot users. Jason, Bruce, Edward all interested. But demand is early — no paying agent-to-agent users yet. |
| Bandwidth | Constrained 7+ active projects. Build deficit day 5. ~10hr/week on Sourcy retainer. This is the bottleneck. Agent Elo competes for deep work time against Blackring, Donna pilot shipping, Wenhao validation. |
| Capital | Modest HK$16K/mo from Sourcy retainer. No external funding. Ilona call exploring €25K-250K for Blackring, not this project. Bootstrap constraint is real. |
| Unfair advantage | Has one Already building multiple agents (supply), already has agent users (demand), already connected to agent creator community. The Donna/avet/OpenClaw stack = built-in contestants for the arena. |
"Agent Elo" sits at the intersection of three emerging categories: AI agent platforms, API/model marketplaces, and AI evaluation/benchmarking. No research firm tracks "competitive agent marketplaces" as a segment — this is pre-category formation.3
| Layer | Size | Basis | Source |
|---|---|---|---|
| Global: AI Agent Platforms | US$5.6B → $47.1B | 2024 → 2030, ~43% CAGR. Includes all autonomous agent infrastructure. | Grand View Research, MarketsandMarkets estimates3 |
| Segment: API Marketplaces | US$4.5B → $8.3B | 2024 → 2028. RapidAPI = ~US$1B valuation. This is the closest revenue analog. | Verified Market Research4 |
| Segment: AI Model Hubs | US$4.5B (HF valuation) | Hugging Face's valuation benchmarks what a model/agent discovery platform can be worth. | Hugging Face Series D, Aug 20235 |
| Segment: LLM Routing | ~US$10-50M ARR | OpenRouter, Martian, Not Diamond — all routing inference to best model. Pre-revenue to early revenue. | Industry estimates6 |
| Addressable: Eric's reach | US$0 | Zero paying users. 7 pilot users for Donna. ~20 agent-curious contacts. This is pre-revenue, pre-product. | CRM data |
| Company | What They Do | Model | Status | Why They're Not Agent Elo |
|---|---|---|---|---|
| LMSys Chatbot Arena7 | Elo leaderboard for LLMs via blind voting | Free research project (UC Berkeley) | 12M+ votes. Canonical. Unfunded. | Ranks models, not agents. No agent-to-agent calls. No marketplace. No routing. |
| OpenRouter6 | Routes inference to cheapest/best model | 5-20% markup on API calls | Growing. Used by Eric + Donna. | Routes models, not agents. No Elo. No quality feedback loop. |
| OpenHub.ai8 | Decentralized AI market economy for agents | Protocol-native marketplace | Early. Docs-stage. | Closest competitor. But protocol-focused, not taste/quality-focused. No Elo mechanism yet. |
| Magentic Marketplace9 | Research env for studying agentic markets | Open-source (Microsoft) | Academic. Oct 2025 paper. | Research, not product. Studies agent economics but doesn't operationalize it. |
| Hugging Face5 | Model hub + community + leaderboards | Freemium SaaS ($4.5B valuation) | ~US$70M ARR (est. 2024) | Hosts models and datasets. No agent composition. No Elo for agents. No routing. |
| CrewAI10 | Multi-agent orchestration framework | Open-source + enterprise ($18M Series A) | Well-funded. Growing. | Orchestration, not marketplace. Agents are internal to your system, not competing with others. |
| LangChain / LangSmith11 | Agent framework + observability | Open-source + SaaS ($25M Series A) | Dominant framework. | Framework, not marketplace. No external agent discovery or ranking. |
| GPT Store (OpenAI)12 | Marketplace for custom GPTs | Platform (no creator rev share til late 2024) | Widely considered underwhelming. | See "Failed Examples" below. |
| Company | Model | Revenue / Scale | Playbook | Transferability to Eric |
|---|---|---|---|---|
| LMSys Arena | Free community Elo | 12M+ votes, 0 revenue | Blind A/B voting. Academic credibility. Became the benchmark for LLMs. No monetization. | Mixed Proves Elo works for AI. But they chose not to monetize. Can Eric build the monetized version? |
| Hugging Face | Freemium hub | ~US$70M ARR, $4.5B valuation | Open-source model hosting → community → enterprise SaaS. 7+ years. Network effects from model downloads. | Low Took 7 years + massive VC funding ($400M+ raised). Community flywheel requires scale Eric doesn't have. |
| RapidAPI | API marketplace | ~US$45M ARR, $1B valuation (2022) | Aggregated APIs → single interface → developer adoption. 35K+ APIs listed. Usage-based pricing. | Instructive Closest marketplace analog. But required $300M+ funding and years of supply aggregation. Valuation reportedly dropped post-2022. |
| OpenRouter | Model routing | ~US$10-30M ARR (est.) | Unified API for all LLM providers. 5-20% margin on top. Developer-friendly. Low friction. | High Small team, bootstrap-friendly. Routes to best model per task. Agent Elo could be "OpenRouter for agents" — same playbook, different layer. |
| Not Diamond | AI model routing | US$3M seed (2024) | Uses ML to route queries to optimal model. "Best model for every prompt." Quality-based routing. | High Directly validates quality-based routing as a venture category. Same thesis, different layer (models vs agents). |
| Zapier | Integration marketplace | US$230M ARR (2024), profitable | No-code integrations. 7,000+ apps. Marketplace effects. Took 12+ years. | Instructive Shows integration marketplaces can be massive. But 12 years + no AI = different era. |
| Metric | Benchmark (Winner) | Benchmark (Average) | Agent Elo Estimate | Source |
|---|---|---|---|---|
| Take Rate | 20-30% (App Store) | 10-15% (API marketplaces) | 10-15% on routed calls | Industry standard4 |
| ARPU (agent builder) | ~US$200/mo (HF Pro) | ~US$20-50/mo | ~US$0 (free tier) → US$50-200/mo (pro) | HF pricing5 |
| ARPU (agent consumer) | ~US$20/mo (OpenRouter avg) | ~US$5-10/mo | Usage-based, ~US$10-50/mo | OpenRouter estimates6 |
| Paid conversion | 5-8% (dev tools) | 2-4% | 2-5% | Industry benchmarks |
| Gross margin | 70-85% (SaaS) | 60-70% | 60-80% | Depends on proxy vs routing model |
| Cost Component | Per-Unit Cost | Assumption | Source |
|---|---|---|---|
| Elo computation | ~US$0.001/match | Simple rating update per interaction. CPU-bound, negligible. | Standard Elo algorithm |
| LLM judge (quality eval) | ~US$0.003-0.01/eval | Claude Haiku or GPT-4o-mini to rate output quality. ~500 tokens/eval. | Anthropic/OpenAI pricing14 |
| API gateway/proxy | ~US$0.0001/request | If proxying calls through Agent Elo's infra. Cloudflare Workers or similar. | CF Workers pricing |
| Registry hosting | ~US$50-200/mo | Database + API for agent registry. Supabase or Planetscale. | Supabase pricing |
| Leaderboard/frontend | ~US$0-20/mo | Static site on Vercel. Minimal cost. | Vercel free tier |
Break-even = monthly infra costs (~US$200-500) covered by take rate revenue. At 10% take rate, need ~US$5K/mo in agent-to-agent call volume to cover basic costs. With 500 active agents doing US$10/mo avg volume through the platform = US$500/mo take. Not enough. Need either higher volume or premium tier.
| Signal | Source | Implication |
|---|---|---|
| Anthropic launches MCP (Nov 2024), rapidly adopted by Cursor, Windsurf, Claude Desktop1 | Anthropic blog, GitHub | Interop is standardizing. Agents can now call other agents as tools. This is the prerequisite for Agent Elo. |
| Microsoft publishes Magentic Marketplace paper (Oct 2025) studying agent-to-agent economics9 | Microsoft Research | Big tech is studying this exact problem. Validates the category. But also means incumbents may build it. |
| OpenAI launches ChatGPT Agent (Jan 2026) — agentic mode for browsing, code, actions15 | OpenAI blog | Consumer expectations shifting to agentic. More agents = more need for ranking/routing. |
| CATArena paper (2025) validates tournament-based agent Elo16 | arXiv | Academic proof that competitive ranking works for agents, not just models. |
| OpenHub.ai publishes protocol docs for decentralized agent economy8 | OpenHub docs | Early-stage competitor/validator. Shows builders are converging on agent marketplace concept. |
| Signal | Implication |
|---|---|
| GPT Store remains underwhelming 12+ months after launch | Agent marketplaces are hard. Discovery + quality + monetization all need to work simultaneously. |
| Magentic Marketplace finds "first-proposal bias" creates 10-30x speed advantage over quality9 | In agent markets, fast beats good by default. Elo needs to counter this — reward depth and taste, not just speed. |
| Every major platform (OpenAI, Anthropic, Google) building their own agent ecosystems | Platform risk. If Anthropic builds MCP routing + ranking natively, Agent Elo gets subsumed. |
| Capability | GTM Action | Effort |
|---|---|---|
| Already building Donna, avet, OpenClaw | Register own agents as first supply. Dog-food the ranking system. | Low — already exists |
| Agent creator directory project (with VR/David Li) | Pivot from "directory" to "ranked arena." Same audience, stronger value prop. | Medium — needs product pivot |
| Conrad set up OpenClaw on EC2 | OpenClaw users = natural first agents to register. Every OpenClaw instance = potential arena contestant. | Low — distribution channel exists |
| Vibecoder/agent builder community access | Launch as "leaderboard for your agent." Builders compete for rank. Vanity + distribution incentive. | Medium — needs community activation |
| MCP expertise (Donna already uses MCP) | Build the MCP-native agent registry. Technical credibility. | Medium — needs build time |
| Phase | What | When | Success = |
|---|---|---|---|
| 0. Leaderboard | Static Elo ranking site for agents. LLM judge + human voting. Public. | Feb 2026 | 50+ agents, 500+ votes |
| 1. Registry | MCP-native agent registry. Agents register, expose capabilities, get discovered. | Mar-Apr 2026 | 100+ agents, 10+ agent-to-agent calls/day |
| 2. Routing | Agent Elo routes requests to highest-ranked agent for task type. "OpenRouter for agents." | Q2 2026 | 1K+ calls/day, first revenue (take rate) |
| 3. Marketplace | Full marketplace. Agents earn from being called. Builders get paid. Elo drives distribution. | Q3-Q4 2026 | US$5K/mo GMV, 500+ active agents |
Limited applicability for this project:
"This is too early. There aren't enough agents in the wild to rank. MCP adoption is months old. The 'agentic economy' is a research paper, not a market. Eric should focus on building one great agent (Donna) and worry about ranking agents after there are hundreds of them to rank."
My response: This is probably right for now. The timing question is the crux. Agent Elo in Feb 2026 is a leaderboard experiment. Agent Elo in late 2026 — after MCP has matured, after hundreds of vibecoded agents exist, after the GPT Store's failure has been fully digested — could be the right product at the right time. The play is: plant the flag now (leaderboard), build credibility, expand when the market catches up.
Is this a good opportunity for Eric at this time?
Conditionally yes — as a flag-planting side project, not a primary focus.
The thesis is sound. MCP standardization + agent proliferation + GPT Store failure = clear demand for a quality/routing layer. Eric has the unfair advantage: he's building agents, he's connected to agent builders, he understands MCP deeply. The "OpenRouter for agents" pitch is legible and fundable.
But the timing is early. There aren't enough agents to rank yet. The market is pre-category. Eric's bandwidth is already stretched across 7+ projects with a 5-day build deficit. Adding another primary focus would be destructive.
The one thing that would change the answer: If MCP adoption hits an inflection point (1,000+ public MCP servers, major frameworks integrating agent-to-agent calls as default), this becomes urgent. Watch for that signal.
Recommended path:
1. This week: Don't build Agent Elo. Ship Donna. Crack the ring BLE. Protect Monday deep work.
2. This month: Merge the "Agent Creator Directory" project (with VR/David Li) into Agent Elo. Same audience, stronger thesis. Build a static leaderboard as a weekend project. Register Donna + a few public agents. Post to HN.
3. Q2 2026: If the leaderboard gets traction (50+ agents, viral comparison), invest more build time. Add MCP-native registry. Start routing.
4. If it doesn't get traction: No loss. The leaderboard took 1-2 days to build. Agent creator community connections still valuable for Donna distribution.
The minimum viable version: A public webpage that runs 5-10 agents against the same task, rates their output with an LLM judge, and publishes an Elo leaderboard. One page. One afternoon. Plant the flag.