Disclosure: This analysis was produced using Claude (Anthropic). Critiques of Anthropic competitors may carry unconscious bias. Read accordingly.
There are five categories of AI agent tooling. Most discourse conflates them. That’s where bad comparisons start.
| Category | What It Is | Examples | Profile |
|---|---|---|---|
| IDE Agent | AI embedded in your editor. Sees diffs, terminal, files. | Cursor, Copilot, Antigravity | Fast iteration, low autonomy |
| CLI Agent | Standalone terminal agent. Plans, executes, iterates. | Claude Code, OpenCode, Codex | High autonomy, long context |
| Agent SDK | Library to build your own agents. | Claude Agent SDK, OpenAI Agents SDK | Programmable, custom flows |
| Agent Framework | Orchestration for multi-agent workflows. Model-agnostic. | LangGraph, CrewAI, AutoGen | Maximum control, steep curve |
| Agent Runtime | Self-hosted platform for persistent agent identity and messaging. | OpenClaw, Open WebUI | Persistent identity, messaging-first |
This taxonomy is analytically useful but has a short shelf life. Cursor launched a CLI. Claude Code runs inside VS Code. Copilot now ships agents, Claude support, and MCP. Google Antigravity has multi-agent orchestration. By late 2026, “IDE Agent” vs “CLI Agent” will likely merge into one category: Coding Agent. And as we’ll see, coding agents themselves are becoming the smaller story.
The fundamental split: Cursor forked VS Code to give AI direct editor access — diffs, terminal, file system, project structure.2 Copilot operates as an extension through public APIs. Cursor reached $1B ARR in under 2 years — fastest B2B SaaS growth in history.4
But revenue ≠ profit. One analyst estimates Cursor pays $650M/year to Anthropic alone against ~$500M revenue — negative 30% gross margin.38 Heavy users consume $838–$2,015 in tokens against a $180/year subscription.39 The switch to usage-based pricing caused an uproar on r/cursor: throttling accusations (1,253 upvotes), quality degradation complaints, and subscribers cancelling.37
More importantly: the fork moat is eroding. Claude Code now runs natively in VS Code as an extension. Google Antigravity offers a free agent-first IDE with Gemini 3 Pro + Claude Opus, multi-agent orchestration, persistent memory, and one-click Cursor import.49 Twitter tells the story:
Claude Code is a terminal-based autonomous agent — reads files, runs commands, edits code, tests, iterates. Long-context (200K tokens), extended thinking, deep planning.6 OpenCode is the open-source challenger: 101K stars, 2.5M monthly developers, 75+ model providers, model-agnostic, no vendor lock-in.7
| Tool | Price | Model |
|---|---|---|
| Claude Code (API) | $3/$15 per MTok (Sonnet 4.5)8 | Pay-per-token |
| Claude Code (Max plan) | $100–$200/mo | Subscription |
| OpenCode | Free (BYO API key) | Any provider |
| Codex Cloud (OpenAI) | Included in ChatGPT Plus ($20) | GPT-5 family |
The practitioner consensus is: nobody agrees which is better. For every dev who swears by Claude Code, there’s another who finds it gets stuck in loops. For every Cursor evangelist, there’s another hitting rate limits. The tools evolve so fast that comparisons expire within weeks.6
Claude Agent SDK (Sep 2025): the engine powering Claude Code, packaged as a library. Single-agent focused, built-in file ops, automatic context compaction.9 OpenAI Agents SDK (Mar 2025): multi-agent coordination with handoffs and guardrails, provider-agnostic.10
Most discourse ignores this layer. It shouldn’t. 86% of copilot spending ($7.2B) goes to agent-based systems. LangGraph has 4.2M monthly downloads. The agentic AI market hit $7.55B in 2025.46 Frameworks are the least glamorous but most production-relevant category.
OpenClaw is not a coding tool — it’s a persistent agent runtime for giving AI a phone number, messaging channels, and long-term memory. 188K GitHub stars. 60K Discord members. 135K exposed instances.12
And it’s a security disaster:
The persistent agent runtime concept is valid. OpenClaw as the implementation is a liability. OpenCode has its own critical vulnerability: CVE-2026-22812, unauthenticated remote code execution.35
| Company | What Happened | Lesson |
|---|---|---|
| Windsurf | $3B OpenAI deal collapsed. Google hired leadership ($2.4B), Cognition got the product.15 | INTERFACE RISK Interface-layer tools are acquisition targets, not durable businesses. |
| Supermaven | Acquired by Cursor.5 | Fast completions became a feature, not a product. |
| Graphite | Acquired by Cursor for >$290M.5 | Code review is a feature of the IDE, not a standalone SaaS. |
| Assistants API | Deprecated Mar 2025, sunset H1 2026.16 | Even first-party APIs get killed. Build on abstractions. |
No (METR study, Jul 2025): 16 experienced developers, 246 real tasks, randomised. AI users were 19% slower — but perceived themselves 20% faster. 69% kept using AI anyway. 95% CI: −26% to +9%, so the effect could go either way.1718
Yes (Cursor/UChicago study): +39% merged PRs, stable revert rate.19 But a CMU counter-study found: velocity up, static analysis warnings and code complexity also up.42 Faros AI data from 10,000 developers: AI-generated code is 154% larger per PR with 9% more bugs.38
Both studies have problems. METR: 16 participants, within-subjects design, dated tools, CI crossing zero.40 Cursor’s study: published on their own blog (sponsor bias), and the CMU counter-data undermines the headline. The “19% slower” number became a cultural weapon:
A year ago, “vibe coding” was the hot term — prompt-driven, rapid, ship-fast. The consensus hardened fast:
The backlash is not anti-AI. It’s anti-uncritical-AI. The emerging best practice: “80% AI, maintain the mental model.” Karpathy himself — who coined “vibe coding” — now distinguishes “agentic engineering” as the serious version: you orchestrate agents who write code while acting as oversight. Yet Karpathy also just spent days hand-writing a 200-line pure Python GPT. The person who coined vibe coding still hand-writes the code that matters most.20
Model Context Protocol was supposed to be the interop standard that made tool choice irrelevant — connect any agent to any data source. It’s adopted by OpenAI, native in Claude Desktop, supported by Google DeepMind.27
The critics are credible and damning:
MCP solves data access interop partially. Agent memory, preferences, identity, and workflow portability are nowhere close. The promise of “plug any agent into any tool” is real in demos. In production, it’s premature.
Anthropic’s evolving claim:
| Date | Who | Claim |
|---|---|---|
| Mar 2025 | Dario Amodei | “In 3–6 months, AI is writing 90% of the code.” |
| Oct 2025 | Dario Amodei | “90% of code at Anthropic written by AI” — then adds: “you need just as many engineers. Maybe more.”57 |
| Feb 2026 | Mike Krieger (CPO) | “Today it’s effectively 100%.” Teams shipping 2–3K line PRs generated by Claude.59 |
The bait-and-switch: “100% AI code” means the first draft is AI-generated. Humans do all the architecture, prompting, reviewing, editing, testing, and deployment. It’s like saying “the printer writes 100% of the books.”
While this analysis was comparing coding tools across 65 sources, the real action moved to a different layer.
Engineers could validate AI output, so that’s where agents started. Now agents are expanding across three deployment waves:
| Wave | Agent Surface | Automates | When |
|---|---|---|---|
| 1. Coding | Files, terminal, git | Junior dev work | 2023–2025 |
| 2. Ops | Pipelines, monitoring, infra | DevOps / SRE tasks | 2025–2026 |
| 3. Knowledge work | Email, calendar, Slack, browser, docs, spreadsheets | Analyst / coordinator tasks | 2026–now |
Each wave targets more expensive humans. AgentOps is already a named discipline — Harness, Red Hat, Pulumi, and Microsoft all published architecture guides in Jan–Feb 2026.64 Cognition’s Devin is deployed at Infosys for 10K+ users — not for new code, but for brownfield engineering, tech debt reduction, and production maintenance.65
Anthropic’s Claude Cowork (Jan 2026, Windows version Feb 12): a desktop agent with access to files, browser, email, Slack, spreadsheets.61 Microsoft Copilot 365 has been doing “one agent across all your apps” since November 2023 with 1M+ enterprise users and GDPR compliance. Google’s Gemini integrations span Workspace, Chrome, and Android.
The full Claude ecosystem now spans: Claude Code (developers), Cowork (everyone else), Chrome extension (browser), Excel integration (spreadsheets), and 50+ MCP connectors (Notion, Gmail, Slack, Figma). The pitch: one agent identity across your entire digital life.
The popular formulation: “The tool is less important than agency — your prompting skill, context architecture, and conversation design determine outcomes more than which IDE you use.”
This claim is partly right and partly lazy. The honest answer depends on the layer:
The refined version: “For code generation, prompting matters more than vendor. For code review, tool UX matters a lot. For knowledge work, identity and context are the whole game.”
And the question nobody asks: what if agency itself is a temporary layer? If models improve enough to infer your preferences, relationships, and working style from a few interactions, hand-crafted agent architectures (system prompts, persona files, custom commands) become scaffolding you don’t need. The answer would be “model > everything” — and that’s the one thesis neither side of the tool-vs-agency debate has tested.
The coding tool debate is resolving through convergence. Cursor, Claude Code, Antigravity, and Copilot are racing toward the same feature set. The remaining differentiators are price ($20/mo vs $100+), review UX, and enterprise compliance. Within 12 months, vendor choice within the “coding agent” category will matter less than it does today.
“AI writes 100% of the code” is technically true and strategically misleading. It means first drafts. Engineers architect, prompt, review, test, deploy. Anthropic is hiring more engineers, not fewer. The role shifts from coder to agent manager — promotion to a higher abstraction level.
The real battleground has moved to knowledge work. Coding was the beachhead (engineers can validate AI output). Ops is the expansion (deploy, monitor, fix). Knowledge work is the destination (email, Slack, docs, CRM — every app on a laptop). Each wave targets more expensive humans. But this wave has a regulatory speed limit that most commentary ignores. Microsoft Copilot 365 is already there with enterprise compliance. Newcomers face GDPR, SOC 2, and data residency requirements before they can compete.
OpenClaw is a security liability. The persistent agent runtime concept is valid. The implementation fails every basic security check. 12–20% of its marketplace is malicious. The supply chain analogy is early npm with root access.
MCP is contested. Adopted everywhere, trusted by almost nobody who’s deployed it at scale. Simon Willison’s prompt injection critique alone should pause any production deployment. The interop promise is real for demos, premature for production.
The one-sentence version: The tools are converging. The question is no longer “which coding tool?” but “which agent platform gets access to your entire digital life?” — and that answer depends on identity, context, and compliance, not which IDE is trendiest this month.