The Agent Tooling Landscape — Cursor, Claude Code, SDKs, OpenClaw & The Agency Question

Disclosure: This analysis was produced using Claude (Anthropic). Critiques of Anthropic competitors may carry unconscious bias. Read accordingly.

I. The Map

There are five categories of AI agent tooling. Most discourse conflates them. That’s where bad comparisons start.

Cursor ARR

$1B+

Nov 2025, $29.3B val

OpenClaw Stars

188K

fastest OSS growth ever

OpenCode Users

2.5M

101K GitHub stars

Copilot Enterprise

$39/mo

90% of Fortune 100

Category	What It Is	Examples	Profile
IDE Agent	AI embedded in your editor. Sees diffs, terminal, files.	Cursor, Copilot, Antigravity	Fast iteration, low autonomy
CLI Agent	Standalone terminal agent. Plans, executes, iterates.	Claude Code, OpenCode, Codex	High autonomy, long context
Agent SDK	Library to build your own agents.	Claude Agent SDK, OpenAI Agents SDK	Programmable, custom flows
Agent Framework	Orchestration for multi-agent workflows. Model-agnostic.	LangGraph, CrewAI, AutoGen	Maximum control, steep curve
Agent Runtime	Self-hosted platform for persistent agent identity and messaging.	OpenClaw, Open WebUI	Persistent identity, messaging-first

This taxonomy is analytically useful but has a short shelf life. Cursor launched a CLI. Claude Code runs inside VS Code. Copilot now ships agents, Claude support, and MCP. Google Antigravity has multi-agent orchestration. By late 2026, “IDE Agent” vs “CLI Agent” will likely merge into one category: Coding Agent. And as we’ll see, coding agents themselves are becoming the smaller story.

II. Architecture: How the Major Tools Differ

IDE Agents: Fork vs Extension

The fundamental split: Cursor forked VS Code to give AI direct editor access — diffs, terminal, file system, project structure.² Copilot operates as an extension through public APIs. Cursor reached $1B ARR in under 2 years — fastest B2B SaaS growth in history.⁴

But revenue ≠ profit. One analyst estimates Cursor pays $650M/year to Anthropic alone against ~$500M revenue — negative 30% gross margin.³⁸ Heavy users consume $838–$2,015 in tokens against a $180/year subscription.³⁹ The switch to usage-based pricing caused an uproar on r/cursor: throttling accusations (1,253 upvotes), quality degradation complaints, and subscribers cancelling.³⁷

More importantly: the fork moat is eroding. Claude Code now runs natively in VS Code as an extension. Google Antigravity offers a free agent-first IDE with Gemini 3 Pro + Claude Opus, multi-agent orchestration, persistent memory, and one-click Cursor import.⁴⁹ Twitter tells the story:

Cursor’s fork moat under attack (Twitter, Feb 2026)

@ethanmclark1: “Their edge was codebase indexing. Claude Code is now inside VS Code, doesn’t need to index… No margin being a wrapper when the platform you forked invites the labs in.”⁵²
@w3_surfer: “I canceled my @cursor_ai subscription. Using Claude plugin in VS Code now.”
@buildanything: “Google is giving away a Cursor-killer IDE for free. What prevents them from nuking Cursor?”
@PhilWilliammee (defending Cursor): “Co-workers that use Claude Code easily spend several hundred dollars a day.” The defence is price, not capability.

CLI Agents: Claude Code vs OpenCode

Claude Code is a terminal-based autonomous agent — reads files, runs commands, edits code, tests, iterates. Long-context (200K tokens), extended thinking, deep planning.⁶ OpenCode is the open-source challenger: 101K stars, 2.5M monthly developers, 75+ model providers, model-agnostic, no vendor lock-in.⁷

Tool	Price	Model
Claude Code (API)	$3/$15 per MTok (Sonnet 4.5)⁸	Pay-per-token
Claude Code (Max plan)	$100–$200/mo	Subscription
OpenCode	Free (BYO API key)	Any provider
Codex Cloud (OpenAI)	Included in ChatGPT Plus ($20)	GPT-5 family

The practitioner consensus is: nobody agrees which is better. For every dev who swears by Claude Code, there’s another who finds it gets stuck in loops. For every Cursor evangelist, there’s another hitting rate limits. The tools evolve so fast that comparisons expire within weeks.⁶

Agent SDKs: Claude vs OpenAI

Claude Agent SDK (Sep 2025): the engine powering Claude Code, packaged as a library. Single-agent focused, built-in file ops, automatic context compaction.⁹ OpenAI Agents SDK (Mar 2025): multi-agent coordination with handoffs and guardrails, provider-agnostic.¹⁰

Claude SDK wins when

One sophisticated agent, not many
Long-running tasks (hours)
File system + code execution needed
Context management is critical

OpenAI SDK wins when

Multiple agents handing off
Provider flexibility matters
Quick prototyping with guardrails
Avoiding Claude lock-in

Agent Frameworks: The Underestimated Category

Most discourse ignores this layer. It shouldn’t. 86% of copilot spending ($7.2B) goes to agent-based systems. LangGraph has 4.2M monthly downloads. The agentic AI market hit $7.55B in 2025.⁴⁶ Frameworks are the least glamorous but most production-relevant category.

Agent Runtimes: OpenClaw’s Security Crisis

OpenClaw is not a coding tool — it’s a persistent agent runtime for giving AI a phone number, messaging channels, and long-term memory. 188K GitHub stars. 60K Discord members. 135K exposed instances.¹²

And it’s a security disaster:

341 malicious “skills” (11.3% of marketplace) designed to steal crypto and credentials³⁰
32K agent credentials leaked via database breach³²
Default installs fail every basic security check: 0.0.0.0 binding, plaintext credentials, Docker socket mounted, HTTP control panel⁵³
VirusTotal partnership explicitly can’t catch prompt injection⁵⁴
XDA-Developers published a full article: “Please stop using OpenClaw”³⁴

The persistent agent runtime concept is valid. OpenClaw as the implementation is a liability. OpenCode has its own critical vulnerability: CVE-2026-22812, unauthenticated remote code execution.³⁵

III. The Consolidation Graveyard

Company	What Happened	Lesson
Windsurf	$3B OpenAI deal collapsed. Google hired leadership ($2.4B), Cognition got the product.¹⁵	INTERFACE RISK Interface-layer tools are acquisition targets, not durable businesses.
Supermaven	Acquired by Cursor.⁵	Fast completions became a feature, not a product.
Graphite	Acquired by Cursor for >$290M.⁵	Code review is a feature of the IDE, not a standalone SaaS.
Assistants API	Deprecated Mar 2025, sunset H1 2026.¹⁶	Even first-party APIs get killed. Build on abstractions.

Where durable value sits

Model layer (Anthropic, OpenAI, Google): Most durable. Models improve; everything else adapts.
Distribution + UX layer (Cursor, Copilot): High revenue, vulnerable to model commoditisation.
SDK/Framework layer (Claude Agent SDK, LangGraph): Durable if model-agnostic.
Interface layer (Windsurf, Supermaven): MOST FRAGILE — gets absorbed or dies.

IV. The Productivity Illusion

Do AI tools actually make developers faster?

No (METR study, Jul 2025): 16 experienced developers, 246 real tasks, randomised. AI users were 19% slower — but perceived themselves 20% faster. 69% kept using AI anyway. 95% CI: −26% to +9%, so the effect could go either way.¹⁷¹⁸

Yes (Cursor/UChicago study): +39% merged PRs, stable revert rate.¹⁹ But a CMU counter-study found: velocity up, static analysis warnings and code complexity also up.⁴² Faros AI data from 10,000 developers: AI-generated code is 154% larger per PR with 9% more bugs.³⁸

Both studies have problems. METR: 16 participants, within-subjects design, dated tools, CI crossing zero.⁴⁰ Cursor’s study: published on their own blog (sponsor bias), and the CMU counter-data undermines the headline. The “19% slower” number became a cultural weapon:

How different camps use the same study

Skeptics: “Proof AI coding is a bubble.”
Realists: “AI moved the bottleneck. Coding got faster; prompting, waiting, reviewing didn’t.”
Pragmatists: “Baseline fluency is the moat now. The tool amplifies what you already have.”
The overlooked finding: Experienced devs 19% slower. Junior devs see 26% gains. AI is a levelling force, not a productivity multiplier. It compresses the skill distribution.⁵⁵

Vibe coding is dead. Agentic engineering is here.

A year ago, “vibe coding” was the hot term — prompt-driven, rapid, ship-fast. The consensus hardened fast:

@essjaykay755: “2026 is the year we realize vibe coding didn’t make us better developers. It just made us ship technical debt faster.”
@bothlabs: “If you’re vibe coding without reading the code, you’re not 5x faster — you’re building technical debt 5x faster.”
@dr_lmfao_: “Three hours of prompt engineering produces a facade that collapses at the first edge case.”

The backlash is not anti-AI. It’s anti-uncritical-AI. The emerging best practice: “80% AI, maintain the mental model.” Karpathy himself — who coined “vibe coding” — now distinguishes “agentic engineering” as the serious version: you orchestrate agents who write code while acting as oversight. Yet Karpathy also just spent days hand-writing a 200-line pure Python GPT. The person who coined vibe coding still hand-writes the code that matters most.²⁰

V. MCP: Promise vs Reality

Model Context Protocol was supposed to be the interop standard that made tool choice irrelevant — connect any agent to any data source. It’s adopted by OpenAI, native in Claude Desktop, supported by Google DeepMind.²⁷

The critics are credible and damning:

Simon Willison documented prompt injection attack vectors in MCP and stated plainly: “Model Context Protocol has prompt injection security problems.”⁵¹
Tal Peretz, after deploying at scale: “No security, no policy enforcement, no composability, no centralized auditing, no clear separation between humans and agents.”
Microsoft Research identified “tool-space interference” as a fundamental unsolved problem for multi-agent interop.⁴⁴
Security researchers recommend against MCP for production teams.⁴⁵

MCP solves data access interop partially. Agent memory, preferences, identity, and workflow portability are nowhere close. The promise of “plug any agent into any tool” is real in demos. In production, it’s premature.

VI. “AI Writes 100% of the Code”

Anthropic’s evolving claim:

Date	Who	Claim
Mar 2025	Dario Amodei	“In 3–6 months, AI is writing 90% of the code.”
Oct 2025	Dario Amodei	“90% of code at Anthropic written by AI” — then adds: “you need just as many engineers. Maybe more.”⁵⁷
Feb 2026	Mike Krieger (CPO)	“Today it’s effectively 100%.” Teams shipping 2–3K line PRs generated by Claude.⁵⁹

The bait-and-switch: “100% AI code” means the first draft is AI-generated. Humans do all the architecture, prompting, reviewing, editing, testing, and deployment. It’s like saying “the printer writes 100% of the books.”

Three tiers of “AI does all the coding”

Indie hackers (levelsio, swyx): Yes, 100% AI for demos and personal projects. Levelsio: “AI does all my coding now. My accomplishments are ‘wow great job managing.’” Swyx vibe-designed a conference website at the climbing gym without reading a line of code. These are demos and personal projects — both have financial incentives to promote AI adoption.
Professional engineers (Addy Osmani, Gergely Orosz): The role shifts to review, verification, architecture. Osmani: “Over 30% of senior devs ship mostly AI-generated code, but AI makes errors 75% more common in logic.”⁶⁰ Orosz compares vibe coding to 2021 no-code hype.
The honest framing: From Dario himself, buried in a quote nobody shares: “If Claude is writing 90% of the code, what that means usually is you need just as many software engineers. You might need more, because they can be more leveraged.” This is promotion, not elimination. The job title shifts from “coder” to “agent manager.”

VII. The Debate Already Moved

While this analysis was comparing coding tools across 65 sources, the real action moved to a different layer.

Coding was the beachhead

Engineers could validate AI output, so that’s where agents started. Now agents are expanding across three deployment waves:

Wave	Agent Surface	Automates	When
1. Coding	Files, terminal, git	Junior dev work	2023–2025
2. Ops	Pipelines, monitoring, infra	DevOps / SRE tasks	2025–2026
3. Knowledge work	Email, calendar, Slack, browser, docs, spreadsheets	Analyst / coordinator tasks	2026–now

Each wave targets more expensive humans. AgentOps is already a named discipline — Harness, Red Hat, Pulumi, and Microsoft all published architecture guides in Jan–Feb 2026.⁶⁴ Cognition’s Devin is deployed at Infosys for 10K+ users — not for new code, but for brownfield engineering, tech debt reduction, and production maintenance.⁶⁵

The knowledge work wave

Anthropic’s Claude Cowork (Jan 2026, Windows version Feb 12): a desktop agent with access to files, browser, email, Slack, spreadsheets.⁶¹ Microsoft Copilot 365 has been doing “one agent across all your apps” since November 2023 with 1M+ enterprise users and GDPR compliance. Google’s Gemini integrations span Workspace, Chrome, and Android.

The full Claude ecosystem now spans: Claude Code (developers), Cowork (everyone else), Chrome extension (browser), Excel integration (spreadsheets), and 50+ MCP connectors (Notion, Gmail, Slack, Figma). The pitch: one agent identity across your entire digital life.

The regulatory speed limit “One agent replaces twenty SaaS subscriptions” has a compliance problem. In the EU, an agent reading your email, Slack, and documents triggers data processing requirements most startups can’t meet. Each SaaS product has its own data residency, access control, and audit trail. An agent that bypasses them creates legal exposure. Enterprise buyers care about compliance more than capabilities. This wave has a speed limit that most commentary ignores.

VIII. The “Tool vs Agency” Question

The popular formulation: “The tool is less important than agency — your prompting skill, context architecture, and conversation design determine outcomes more than which IDE you use.”

This claim is partly right and partly lazy. The honest answer depends on the layer:

Where agency wins

Code generation: Within a tool category, prompting skill determines output quality more than vendor. HN developers confirm: “The model and quality of prompts matters more than the IDE/CLI tool used.”
Knowledge work: When the agent calls Slack, Gmail, Notion, and your file system, the “tool” (Claude vs GPT vs Gemini) matters far less than what the agent knows about you — your preferences, projects, people. That’s identity and context, not vendor.
Portability: Conversation design (system prompts, persona architecture) transfers between tools. Good prompts are a skill, not a subscription.

Where tools win

Code review: If AI generates 100% of first drafts, you live in your IDE all day reviewing. Diff quality, inline editing, and context display become the differentiator. That’s pure tool UX.
Affordances: You can’t run background agents in Cursor. You can’t do inline diffs in Claude Code. The workflow shapes what you can attempt.
Deployment: When a CTO rejects a runtime (“400K lines, not controllable”¹⁴), the tool’s constraints override the conversation design. Agency doesn’t survive a technical review.

The refined version: “For code generation, prompting matters more than vendor. For code review, tool UX matters a lot. For knowledge work, identity and context are the whole game.”

And the question nobody asks: what if agency itself is a temporary layer? If models improve enough to infer your preferences, relationships, and working style from a few interactions, hand-crafted agent architectures (system prompts, persona files, custom commands) become scaffolding you don’t need. The answer would be “model > everything” — and that’s the one thesis neither side of the tool-vs-agency debate has tested.

Verdict

The coding tool debate is resolving through convergence. Cursor, Claude Code, Antigravity, and Copilot are racing toward the same feature set. The remaining differentiators are price ($20/mo vs $100+), review UX, and enterprise compliance. Within 12 months, vendor choice within the “coding agent” category will matter less than it does today.

“AI writes 100% of the code” is technically true and strategically misleading. It means first drafts. Engineers architect, prompt, review, test, deploy. Anthropic is hiring more engineers, not fewer. The role shifts from coder to agent manager — promotion to a higher abstraction level.

The real battleground has moved to knowledge work. Coding was the beachhead (engineers can validate AI output). Ops is the expansion (deploy, monitor, fix). Knowledge work is the destination (email, Slack, docs, CRM — every app on a laptop). Each wave targets more expensive humans. But this wave has a regulatory speed limit that most commentary ignores. Microsoft Copilot 365 is already there with enterprise compliance. Newcomers face GDPR, SOC 2, and data residency requirements before they can compete.

OpenClaw is a security liability. The persistent agent runtime concept is valid. The implementation fails every basic security check. 12–20% of its marketplace is malicious. The supply chain analogy is early npm with root access.

MCP is contested. Adopted everywhere, trusted by almost nobody who’s deployed it at scale. Simon Willison’s prompt injection critique alone should pause any production deployment. The interop promise is real for demos, premature for production.

The one-sentence version: The tools are converging. The question is no longer “which coding tool?” but “which agent platform gets access to your entire digital life?” — and that answer depends on identity, context, and compliance, not which IDE is trendiest this month.

References

[2] DigiDAI — Cursor vs GitHub Copilot: The $36 Billion War Fork vs extension architecture comparison

[4] AI Funding Tracker — Cursor Revenue $100M Jan 2025 → $1B Nov 2025 ARR trajectory

[5] Fortune — Cursor acquires Graphite Graphite >$290M, Supermaven earlier

[6] aiorg.dev — Cursor vs Claude Code Claude Code $100–200/mo Max tier, developer consensus

[7] InfoQ — OpenCode 101K stars, 2.5M monthly developers, 75+ providers

[8] Serenities AI — Claude API Pricing 2026 Sonnet 4.5: $3/$15 per MTok

[9] BrainGrid — Claude Agent SDK Released Sep 2025, same engine as Claude Code

[10] Enhancial — Choosing the Right AI Framework OpenAI Agents SDK, multi-agent handoffs

[12] LearnDevRel — OpenClaw 135K exposed instances, 60K Discord, security implications

[14] Practitioner report CTO rejection of OpenClaw: “400K lines, not controllable, not built as a harness”

[15] AI Hackers — Windsurf Collapse $3B deal collapsed, asset split 3 ways

[16] OpenAI — Assistants API FAQ Deprecated Mar 2025, sunset H1 2026

[17] METR Study 16 developers, 246 tasks, 19% slower, RCT design

[18] Syntax AI — METR Analysis 95% CI −26% to +9%, perceived 20% faster

[19] Cursor — Productivity Study UChicago study, +39% merged PRs

[20] arXiv — AI Agentic Programming Survey Vibe coding vs agentic coding taxonomy

[27] Model Context Protocol — Official Docs Adopted by OpenAI, native in Claude, supported by Google

[30] Growth Foundry — OpenClaw 341 malicious skills (11.3%), credential theft

[32] AI Hackers — OpenClaw Security 32K agent credentials leaked

[34] XDA — Please stop using OpenClaw

[35] OpenCode CVE-2026-22812 Unauthenticated RCE

[37] ThoughtCred — Cursor $1B ARR Usage-based pricing backlash

[38] Market Clarity — Is Cursor Profitable? Estimated negative 30% gross margin

[39] Andrea Gao — Cursor Losses $180/yr user consumes $838–$2,015 in tokens

[40] Ben Recht — METR Critique No true control group, fragile stats

[42] CMU Counter-study 39% more PRs but increased warnings and complexity

[44] Microsoft Research — MCP Tool-Space Interference

[45] Scalifi AI — MCP Flaws Not recommended for production teams

[46] Likhon — Production Agentic Systems 2026 $7.55B market, LangGraph 4.2M monthly downloads

[49] AppStack — Cursor vs Antigravity vs Claude Code Antigravity: free, multi-model, persistent memory

[51] Simon Willison — MCP Prompt Injection

[52] Twitter (@ethanmclark1, 13 Feb 2026) “No margin being a wrapper when the platform invites the labs in.”

[53] Twitter (@brankopetric00, 13 Feb 2026) 12-point OpenClaw security audit failure

[54] Twitter (@builtbynikos, 12 Feb 2026) “12–20% of ClawHub packages malicious. VirusTotal won’t catch prompt injection.”

[55] Twitter (@ghidalgodesign) Junior-senior split: experts slower, juniors faster

[57] Dario Amodei (Oct 2025, interview) “You need just as many engineers. Maybe more.”

[59] Mike Krieger, Anthropic CPO (Feb 2026) “Today it’s effectively 100%.”

[60] Addy Osmani — Your AI Coding Agents Need a Manager

[61] Anthropic — Claude Cowork Desktop agent, Windows version 12 Feb 2026

[64] Harness, Red Hat, Pulumi — AgentOps Architecture Guides (Jan–Feb 2026)

[65] Cognition — Devin at Infosys 10K+ users, brownfield engineering