Agent Elo — Deep Market Assessment

I. Thesis

A ranking and routing layer for AI agents.¹ Agents register as callable services (via MCP protocol), get used by both humans and other agents, and earn Elo ratings across multiple quality dimensions: taste, efficiency, depth, reliability.² Best agents get called more. Natural selection for software.

Core insight: The GPT Store failed because it had no quality signal.³ 3 million custom GPTs with no way to discover what's good, no creator revenue, no ranking. Discovery died. Agent Elo fixes this by making quality measurable, transparent, and market-driven.

II. Dog-Food Signal (Phase 0) ✅

Strongest PMF Signal

Eric is living the problem firsthand. Building Donna (relationship agent), avet (vetting agent), OpenClaw infrastructure.⁴
Conrad Ho just self-setup OpenClaw on EC2 — first independent pilot user. Eric's breakfast with him Tuesday.⁴
Jason Chan uses competing "Poke" — hosted assistant with privacy concerns. Agent comparison is happening organically in Eric's network.⁴
Real question Eric faces: "Which agent should I use for X?" No ranking system exists. This is the dog-food moment.

III. Market Inflection Point — CRITICAL UPDATE

Previous analysis (with broken web search) said: "6-12 months before peak inflection."

NEW DATA (Feb 8, 2026) says: WE ARE AT THE INFLECTION POINT RIGHT NOW.

OpenClaw Viral Explosion — Past 7 Days⁵

141,000 GitHub stars + 20,900 forks — gained 100K+ stars in one week⁶
2 million visitors in a single week after going viral⁶
Mainstream media coverage: The Verge, Reuters, BBC Science Focus, Mashable, Nature⁵
Government warnings: China's Ministry of Industry issued formal security warning Feb 5, 2026⁷
Security crisis: 1,100+ exposed instances found, malicious skills in ClawHub marketplace^7,8

OpenClaw Stars

141K

7 days

Exposed Instances

1,100+

via Shodan

Malware Skills

Hundreds

ClawHub

Media Stories

Major

BBC, Reuters

What this means: Agent adoption has crossed the chasm. The security crisis proves the urgent need for quality/safety ranking. ClawHub has no quality layer — it's an open supply chain attack vector.⁸ Agent Elo solves the exact problem the market is feeling right now.

IV. Market Sizing (Layered)

Layer	Market Size	Source
Global AI Agent Market (2026)	US$7.1-7.9B	MarketsAndMarkets⁹
Global AI Agent Market (2030)	US$52.6B (CAGR 46.3%)	MarketsAndMarkets⁹
AI Agent Market (2034)	US$236B	Precedence Research¹⁰
MCP Ecosystem (2026)	5,000+ servers, 6.6M monthly SDK downloads	Abovo Research¹¹
Agent Elo Addressable	US$1-3B (middleware layer, 10-15% of orchestration)	Estimated
Eric's Addressable (2026)	US$0 (pre-product, pre-revenue)	Current state

MCP as Tailwind¹¹

MCP adoption velocity is unprecedented: 50K+ GitHub stars, support from OpenAI, Anthropic, Google, Microsoft, AWS¹¹
2026 = "Agentic Year" — models can now reason, act, and operate across multiple tools in real-time¹²
MCP enables "society of agents" — heterogeneous agents from different providers working together seamlessly¹³
Agent Elo sits perfectly at the MCP layer — routing and ranking agents that expose MCP services

Key insight: "Competitive agent marketplace" is NOT a recognized market segment. But the infrastructure is here (MCP), the demand is proven (OpenClaw viral), and the pain is acute (security crisis). Category creation opportunity.

V. Competitive Landscape

Direct Competitors (Agent Quality & Discovery)

Player	What They Do	Why They're Not Agent Elo
LM Arena¹⁴	Crowdsourced Elo for LLMs (GPT-5.1 vs Gemini 3 Pro)	Ranks models, not agents. No marketplace. No routing. Proves Elo works for AI.
OpenRouter¹⁵	Model routing API ($5M ARR, $100M+ GMV)	Routes models, not agents. No Elo. 5.5% take rate. Direct playbook analog.
Agent.ai¹⁶	Professional marketplace for agents	No Elo ranking. Discovery is weak (same GPT Store problem). Transaction fees but no quality signal.
AWS Marketplace¹⁷	Enterprise agent distribution on Bedrock	Enterprise-focused, no public Elo. Governance/audit trails but no taste ranking.
Hugging Face¹⁸	Model hub ($130M revenue, $4.5B valuation)	1M+ models but no agent composition or routing. Community downloads ≠ quality signal.
GPT Store³	OpenAI's failed agent marketplace	THE KEY PROOF POINT. 3M custom GPTs, terrible discovery, no quality ranking, agents disappearing from search.³ Exactly what Agent Elo fixes.

Agent Orchestration Frameworks (Adjacent)

Framework	Focus	Adoption
LangChain/LangGraph¹⁹	Modular orchestration, stateful workflows	Leads GitHub adoption, 86% of copilot spending uses orchestration²⁰
CrewAI¹⁹	Role-based multi-agent collaboration	Production-ready, lightweight, popular for enterprise
AutoGPT¹⁹	Open-source autonomy pioneer (March 2023)	Experimental, sparked the agent movement

Why orchestration ≠ marketplace: LangChain/CrewAI help you build agents. Agent Elo helps you discover, compare, and route to the best agents. Complementary, not competitive. Agent Elo is the distribution layer.

VI. Failed Examples — The "Don't Build This" List

Company	What They Tried	What Killed Them	Lesson for Agent Elo
GPT Store³	Agent marketplace, 3M GPTs	No quality ranking, terrible discovery, no creator revenue	THIS IS THE PROOF. Quality ranking is the missing piece.
ChatGPT Plugins	Tool-use marketplace	Shut down. No quality signal = chaos	Tool discovery without ranking doesn't work. Elo solves this.
Fixie.ai	Agent marketplace → pivoted to enterprise	Consumer marketplace had no adoption	B2C agent marketplace is hard. Start B2B or developer-first.
Crypto agent marketplaces	On-chain agent trading	$50M+ burned, speculation > utility	Avoid crypto rails. Focus on utility, not speculation.
Zapier AI Actions²¹	Natural language API marketplace	Deprecated in 2026, replaced by Zapier Agents	Individual action marketplaces don't scale. Full agents > fragmented tools.

Death Pattern: Quality Signal Failure

Every failed agent marketplace lacked transparent, measurable quality ranking. GPT Store's search problems,³ ChatGPT Plugins shutdown, Fixie pivot — all stem from the same root cause: users can't tell what's good. Elo fixes this.

VII. Unit Economics (Benchmarked)

Revenue Model

Metric	Benchmark (OpenRouter)¹⁵	Agent Elo Estimate
Take rate	5-5.5% on inference spend	10-15% on routed agent calls (higher value-add than model routing)
Monthly GMV (at scale)	$8M (OpenRouter, May 2025)	$1-5M (Year 2 target)
Monthly revenue (at scale)	$400K (OpenRouter)	$100-750K (Year 2, 10-15% take rate)
ARR (at scale)	$5M (OpenRouter 2025)	$1.2-9M (Year 2 target range)

Cost Structure (COGS)

Cost Component	Per-Unit Cost	Notes
Elo computation	~$0.00001/comparison	Lightweight Bradley-Terry model update¹⁴
API gateway/routing	~$0.0001/call	Standard API infrastructure cost
LLM judge evaluation	$0.001-0.01/comparison	DEATH METRIC. At 1M comparisons/day = $1-10K/day. Must optimize or crowdsource.²²
Storage (traces, leaderboard)	$50-200/month	S3/Postgres for audit logs²³

Cost Optimization Path

Start with crowdsourced human voting (like LM Arena²⁴) — $0 COGS, high quality
Hybrid model: Human votes for training data → fine-tuned judge model → reduce LLM API costs by 10-100x
Death scenario: If you rely on GPT-4 for every comparison at scale, COGS explodes. Must solve judge cost before scaling.

Break-Even Analysis

Scenario	Agents on Platform	Monthly Routed Calls	Monthly Revenue (15% take)	COGS	Gross Margin
Optimistic	500	100K	$15K	$2K	87%
Realistic	200	30K	$4.5K	$1.5K	67%
Pessimistic	50	5K	$750	$500	33%

Break-even: ~200-500 agents, 6-12 months assuming steady growth. OpenRouter took 18 months to reach $5M ARR — similar trajectory expected.¹⁵

VIII. Live Market Signals (February 2026)

Security Crisis = Quality Ranking Demand

The Verge (Feb 2026): "OpenClaw's AI 'skill' extensions are a security nightmare"⁸
eSecurity Planet: "Hundreds of Malicious Skills Found in OpenClaw's ClawHub"²⁵
1Password VP (Feb 2026): "ClawHub has become an attack surface" for malware distribution⁸
China MIIT Warning (Feb 5, 2026): Formal government warning about OpenClaw security risks⁷

Enterprise Governance Becoming Requirement²³

Microsoft's governance framework now includes dedicated "Govern agents" step for responsible AI²⁶
Audit trail requirements: Proving "what an agent knew, decided, and did — plus who approved it"²³
MCP audit logging for compliance (HIPAA, SOX, PCI-DSS, GDPR)²⁷
Agent Elo's quality ranking becomes part of governance/audit layer

Microsoft Research: First-Proposal Bias²⁸

Microsoft's Magentic Marketplace research found that all LLM models exhibit severe first-proposal bias, creating 10-30x advantages for response speed over quality.²⁸ Speed beats quality in agent marketplaces without explicit ranking.

Implication: Agent Elo must surface quality explicitly to overcome this bias. Elo leaderboard + routing preferences can rebalance toward quality.

Synthesis: The market is screaming for quality/safety ranking. Security crisis + enterprise governance needs + cognitive biases = perfect storm for Agent Elo's value prop.

IX. GTM Strategy — Founder-Contextualized

Eric's Unfair Advantages

Dog-fooding: Building Donna, avet, OpenClaw infrastructure. Living the agent discovery problem.⁴
Network: Conrad Ho (first OpenClaw pilot), Jason Chan (Poke user), Wenhao (blue-collar AI), Alice (EdTech). Connected to builders and early adopters.⁴
Distribution channels: Agent Creator Directory (pivot candidate), HackerNews audience, own agents as contestants
Technical credibility: Shipped Donna, avet, Sourcy/Brandy, Blackring. Known for fast execution.

Minimum Viable Test (This Week)

MVP: Public Agent Leaderboard (1-2 days build)

Pick 5-10 agents (Donna, avet, OpenClaw skills, Zapier Agents, public MCP servers)
Run same task through all agents (e.g., "Draft email to investor," "Research market size for X," "Schedule 3 meetings")
LLM judge evaluation on taste, efficiency, depth, correctness
Publish Elo leaderboard as static webpage (Vercel)
Post to HackerNews + LinkedIn — "I built an Elo leaderboard for AI agents. Here's what I learned."

GTM Phases

Phase	Timeline	What to Build	Success Metric
1. Flag Planting	This week	Static leaderboard, 5-10 agents, LLM judge, HN post	500+ HN upvotes, 10+ agent builders reach out
2. Community Leaderboard	Week 2-4	Agent submission form, crowdsourced voting (like LM Arena), auto-update leaderboard	50+ agents submitted, 1K+ community votes
3. API Routing (MVP)	Month 2-3	Agent registry API, routing by Elo + user preferences, 10% take rate	10+ paying customers, $1K MRR
4. MCP Integration	Month 4-6	MCP server for Agent Elo, agents discover/call each other via Elo ranking	100+ agents using Agent Elo routing, $10K MRR
5. Enterprise Governance	Month 6-12	Audit trails, approval flows, compliance (like Microsoft's governance layer²⁶)	3-5 enterprise contracts, $50K+ MRR

Distribution Channels (Prioritized)

HackerNews — Perfect audience (devs, builders, early adopters). Post the leaderboard + learnings.
Agent Creator Directory — Merge into Agent Elo. Use directory traffic to seed leaderboard submissions.
Eric's own agents — Donna, avet become first contestants. "Meta" signal: Eric's using his own infrastructure.
Conrad/Jason network — Early pilot users become evangelists if it works.
LangChain/CrewAI communities — Partner with orchestration frameworks. Agent Elo = distribution for their users.
OpenClaw ecosystem — 141K stars, 2M visitors. Perfect timing to offer quality layer for ClawHub alternatives.⁵

Bandwidth Reality Check

Eric's current capacity: Build deficit day 5. Sourcy retainer (high priority), Blackring (high priority), Donna pilots shipping, Wenhao call tonight.⁴

Agent Elo time requirement:

Phase 1 (MVP leaderboard): 4-8 hours (one deep work session)
Phase 2 (community voting): 12-20 hours over 2 weeks
Phase 3+ (API routing): 40+ hours (conflicts with current priorities)

Recommendation: Ship Phase 1 this week (Monday deep work block). If HN traction is strong (500+ upvotes, agent builders reach out), justify investing Phase 2 time. Otherwise, shelve until bandwidth opens.

X. Red Team Challenge

Why Agent Elo Works

GPT Store failed due to zero quality signal — this fixes that³
OpenClaw viral explosion proves agent adoption is NOW⁵
Security crisis creates urgent demand for quality/safety ranking^7,8
MCP adoption (5K+ servers) provides the interop layer¹¹
LM Arena proves Elo works for AI (crowdsourced, transparent)¹⁴
OpenRouter shows routing business model works ($5M ARR)¹⁵
Eric is dog-fooding — Donna/avet as first contestants⁴
Microsoft research validates the need (first-proposal bias)²⁸
Low COGS if crowdsourced voting (like LM Arena)²⁴
Fast MVP (1-2 days) = low risk, high learning

Why Agent Elo Might Fail

Bandwidth — Eric is maxed (Sourcy, Blackring, Donna)⁴
Chicken-egg: need agents to rank, need ranking to attract agents
LLM judge cost at scale ($3-10K/day if not optimized)
First-proposal bias persists even with Elo (speed still wins)²⁸
Agent quality is multi-dimensional (taste ≠ speed ≠ reliability) — hard to collapse into one Elo
Agents gaming the system (like SEO but worse)
Network effects favor incumbents (OpenAI, Microsoft, Google) — they could build this overnight
Category creation is HARD — "agent marketplace" isn't proven yet
B2C agent marketplace failed (Fixie) — why would this work?
Consumer agents aren't mainstream yet (only developers)

Steel-Man Counter-Argument

"Why wouldn't OpenAI/Microsoft/Google just build this into their platforms?"

They will. And they'll do it badly, like GPT Store.³ Here's why Agent Elo still works:

Cross-platform neutrality: Agent Elo ranks agents from all providers. OpenAI won't rank Anthropic agents fairly. Microsoft won't rank Google agents fairly. Switzerland wins.
Open data: Elo leaderboard is public, transparent, community-driven. Platforms want walled gardens. Transparency wins trust.
Developer-first: Eric is a builder shipping agents. Platforms are selling tools. Dog-food credibility wins community.
Speed: Agent Elo MVP ships this week. Platforms take 12-18 months to ship features. First-mover advantage.

Outcome: Agent Elo becomes the de facto neutral ranking layer. Platforms eventually integrate it (like everyone integrated Elo for gaming) or acquire/copy it. Either way, Eric wins by defining the category.

Verdict: YES — Ship MVP This Week

Previous analysis said "conditionally yes, flag-planting side project." NEW DATA changes this to "YES, ship now."

Why the update:

OpenClaw's viral explosion (141K stars, 2M visitors in 7 days) proves agent adoption has crossed the chasm.⁵
Security crisis (1,100+ exposed instances, malware in ClawHub) creates urgent demand for quality/safety ranking.^7,8
MCP ecosystem maturity (5K+ servers, 6.6M monthly SDK downloads) means the infrastructure is here.¹¹
GPT Store's continued failure proves quality ranking is the missing piece.³
Eric is dog-fooding — Donna/avet as first contestants = strongest PMF signal.⁴

The Minimum Viable Version:

A public webpage running 5-10 agents against the same task, rating output with LLM judge, publishing Elo leaderboard. One deep work session. 4-8 hours. Post to HackerNews. If 500+ upvotes + agent builders reach out → invest Phase 2 time. If not → no loss, 1 day invested.

Timing is CRITICAL: OpenClaw is viral right now. The security crisis is happening now. The conversation about agent quality is live. Ship the MVP this week while the iron is hot. Wait 3 months and the moment passes.

Bandwidth trade-off: Use Monday deep work block (10am-2pm) for Agent Elo Phase 1 instead of cracking gesture ring BLE. Blackring can wait 1 week. Agent Elo's timing window is closing faster.

Success criteria (Week 1): 500+ HN upvotes, 10+ agent builders reach out, 20+ agents submitted to leaderboard. If hit → justify Phase 2 investment. If miss → shelve and return to Blackring/Donna priorities.

The one thing that would change this verdict: If Eric's bandwidth genuinely can't free up 4-8 hours this week (Sourcy emergency, Ilona meeting prep consumes all time), then defer to Week 2. But not Month 2. The inflection point is now.

References

[1] Generect Blog — What Is MCP (Model Context Protocol)? The 2026 Guide MCP overview, agent interoperability

[2] LMSys Blog — Chatbot Arena: New models & Elo system update Elo ranking methodology for AI

[3] OpenAI Community — GPT Store discovery problems GPT Store failure, agents disappearing from search

[4] Eric's personal state files — projects.json, user.json, daily reports (Feb 8, 2026). Donna, avet, OpenClaw context, Conrad Ho pilot, Jason Chan competitive intel

[5] OpenClaw website + multiple media sources (The Verge, Reuters, BBC, Mashable, Nature) — viral traction data

[6] OpenClaw GitHub repository 141K stars, 20.9K forks, 2M visitors in 7 days

[7] Reuters — China warns of security risks linked to OpenClaw (Feb 5, 2026) Government warning, formal MIIT statement

[8] The Verge — OpenClaw's AI 'skill' extensions are a security nightmare ClawHub malware, 1Password VP warning

[9] MarketsAndMarkets — AI Agents Market Size, Share, Growth $7.1-7.9B (2026), $52.6B (2030), CAGR 46.3%

[10] Precedence Research — AI Agents Market Size, Share and Trends 2025 to 2034 $236B by 2034 projection

[11] Abovo Research — MCP 2025 Deep-Research Report 5K+ MCP servers, 6.6M monthly SDK downloads, 50K+ GitHub stars

[12] Generect Blog — 2026 as the Agentic Year Models can reason, act, operate across tools in real-time

[13] Microsoft Research — Tool-space interference in the MCP era Society of agents, heterogeneous collaboration

[14] LM Arena Leaderboard Crowdsourced Elo rankings, GPT-5.1 vs Gemini 3 Pro, Bradley-Terry model

[15] Sacra Research — OpenRouter at $100M GMV $5M ARR, 5.5% take rate, $8M monthly GMV (May 2025)

[16] Agent.ai marketplace Professional network for AI agents, transaction-based

[17] AWS Marketplace — AI Agents and Tools Enterprise agent distribution on Bedrock

[18] Fueler — Hugging Face in 2026: Usage, Revenue, Valuation $130M revenue (2024), $4.5B valuation, 1M+ models

[19] Iterathon — Agent Orchestration 2026: LangGraph, CrewAI & AutoGen Guide Framework comparison, adoption patterns

[20] JSGuru Jobs — AI Agent Development Tools 2026 86% of copilot spending uses orchestration

[21] Zapier AI Actions documentation Deprecated 2026, replaced by Zapier Agents

[22] OpenReview — Holistic Agent Leaderboard (ICLR 2026) Agent evaluation infrastructure, 21,730 rollouts cost ~$40K

[23] Pedowitz Group — How to Audit AI Agent Decisions and Actions Audit trail requirements, proving what agent knew/decided/did

[24] LM Arena — How It Works Crowdsourced voting methodology, blind pairwise comparisons

[25] eSecurity Planet — Hundreds of Malicious Skills Found in OpenClaw's ClawHub ClawHub security crisis details

[26] Microsoft Learn — Governance and security for AI agents Enterprise governance framework, "Govern agents" step

[27] Tetrate — MCP Audit Logging: Tracing AI Agent Actions for Compliance HIPAA, SOX, PCI-DSS, GDPR compliance for agents

[28] Microsoft Research — Magentic Marketplace research paper First-proposal bias, 10-30x advantage for speed over quality