Anthropic's Claude Sonnet 4.6 AI model achieves 79.6% on SWE-bench, offering cost-effective performance.

Editorial illustration for Anthropic's Sonnet 4.6 hits 79.6% on SWE-bench, costs one‑fifth of Opus

Claude Opus 4.6: 1M Tokens, Agent Teams, AI Coding Leap

Anthropic's Sonnet 4.6 hits 79.6% on SWE-bench, costs one‑fifth of Opus

February 17, 2026 • 2 min read

Why does this matter? Because Anthropic just put a price tag on flagship‑level coding ability. Sonnet 4.6, the company’s latest model, claims to deliver performance that rivals its higher‑priced sibling, Opus 4.6, while consuming only a fifth of the compute budget.

That cost gap could tip the scales for businesses weighing AI‑driven development tools against traditional engineering spend. While the numbers sound impressive on paper, the real test is whether the model holds up on benchmarks that matter to developers today. The SWE‑bench Verified suite, widely used to gauge real‑world software coding skill, and the OSWorld‑Verified agentic computer‑use test are two such yardsticks.

If Sonnet 4.6 can stay competitive on those fronts, the economics of AI‑assisted coding may shift dramatically. Below, Anthropic’s own benchmark table lays out the details.

The benchmark table Anthropic released paints a striking picture. On SWE-bench Verified, the industry-standard test for real-world software coding, Sonnet 4.6 scored 79.6% -- nearly matching Opus 4.6's 80.8%. On agentic computer use (OSWorld-Verified), Sonnet 4.6 scored 72.5%, essentially tied with Opus 4.6's 72.7%.

On office tasks (GDPval-AA Elo), Sonnet 4.6 actually scored 1633, surpassing Opus 4.6's 1606. On agentic financial analysis, Sonnet 4.6 hit 63.3%, beating every model in the comparison, including Opus 4.6 at 60.1%. In many of the categories enterprises care about most, Sonnet 4.6 matches or beats models that cost five times as much to run.

An enterprise running an AI agent that processes 10 million tokens per day was previously forced to choose between inferior results at lower cost or superior results at rapidly scaling expense. In Claude Code, early testing found that users preferred Sonnet 4.6 over Sonnet 4.5 roughly 70% of the time. Users even preferred Sonnet 4.6 to Opus 4.5, Anthropic's frontier model from November, 59% of the time.

They rated Sonnet 4.6 as significantly less prone to over-engineering and "laziness," and meaningfully better at instruction following.

Anthropic's Sonnet 4.6 matches flagship AI performance at one-fifth the cost, accelerating enterprise adoption - VentureBeat AI

Will enterprises shift to Sonnet 4.6? The numbers suggest a compelling case. Scoring 79.6% on SWE‑bench, the model trails Opus 4.6 by just 1.2 points while costing only a fifth of the price, a gap that could influence budgeting decisions.

Its 72.5% result on OSWorld‑Verified shows parity in agentic computer use, reinforcing the claim of near‑flagship capability across coding, long‑context reasoning, and design tasks. Yet the benchmark table alone can’t confirm real‑world performance under diverse workloads, and the beta status of the 1 million‑token context window leaves its stability unproven. Anthropic’s positioning of Sonnet 4.6 as the default model signals confidence, but adoption will depend on how firms evaluate cost savings against any potential trade‑offs in reliability or support.

The upgrade across multiple domains is notable, but whether it will translate into broader corporate uptake remains uncertain. For now, the data present a clear, if cautious, indication that a lower‑cost alternative can approach flagship metrics without overtly compromising key benchmarks.

Common Questions Answered

How does Claude Opus 4.5 perform on the SWE-bench Verified benchmark?

Claude Opus 4.5 achieved an unprecedented 80.9% performance on the SWE-bench Verified benchmark, which is the first AI model to exceed 80% and surpass all human engineering candidates. This milestone represents a significant breakthrough in AI coding capabilities, outperforming competitors like GPT-5.1 (74.2%) and Gemini 3 Pro (71.8%).

What makes Claude Opus 4.5's pricing unique in the AI coding assistant market?

Claude Opus 4.5 is priced at $5 per million input tokens and $25 per million output tokens, which represents a 66% reduction from previous pricing models. Additional cost savings are available through prompt caching (up to 90%) and batch processing (50%), making advanced AI coding capabilities more accessible to a broader range of developers and enterprises.

What are the key technical innovations in Claude Opus 4.5?

The model introduces several technical innovations, including new compression algorithms that reduce input requirements by 30% while maintaining quality, and an innovative 'effort' parameter that allows developers to adjust reasoning intensity. Additionally, the model provides native-level support for multiple programming languages including Python, JavaScript, TypeScript, Java, C++, Go, and Rust.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

Claude Opus 4.6: 1M Tokens, Agent Teams, AI Coding Leap

Further Reading

Common Questions Answered

How does Claude Opus 4.5 perform on the SWE-bench Verified benchmark?

What makes Claude Opus 4.5's pricing unique in the AI coding assistant market?

What are the key technical innovations in Claude Opus 4.5?

Most Popular

OpenClaw AI agent used to deliver Trojans via fake ClawHub skills

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

US and Germany use data to map bobsled tracks and fix performance gaps

Anthropic unveils Claude Opus 4.6 with multi‑agent code and large context window

Cerebras Leads Top 5 Fast LLM APIs with Low Latency, High Token Rate

Databricks DB cuts app build to days; Lakebase runs PostgreSQL on lakehouse

Nvidia, Groq race in limestone to real‑time AI, targeting 10× lower token cost

Nvidia technique reduces LLM reasoning cost 8‑fold while preserving accuracy

xAI faces staff exodus as human errors blunt raw AI intelligence

xAI launches GLM-5 and AI-driven customer intelligence platform

Further Reading

Related Reading

India's startups move into hardware as they design AI-native data centres

Demand for Forward-Deployed Engineers Rises as AI Agents Talk to Customers

60% of singles label AI relationships as cheating, sparking divorce concerns

Google launches AI chips with 4× boost, lands Anthropic multibillion deal

Anthropic's Claude also citing Elon Musk's Grokipedia, reports say

Laurie Spiegel says Music Mouse is an expert system, not generative AI

Anthropic declines to patch reported AI agent vulnerability, cites design

Anthropic-Pentagon AI feud escalates as You.com co-founders Socher, McCann cited

Anthropic and Infosys join to create AI agents for telecom and regulated sectors

Common Questions Answered

How does Claude Opus 4.5 perform on the SWE-bench Verified benchmark?

What makes Claude Opus 4.5's pricing unique in the AI coding assistant market?

What are the key technical innovations in Claude Opus 4.5?

Most Popular

OpenClaw AI agent used to deliver Trojans via fake ClawHub skills

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

US and Germany use data to map bobsled tracks and fix performance gaps

Anthropic unveils Claude Opus 4.6 with multi‑agent code and large context window

Cerebras Leads Top 5 Fast LLM APIs with Low Latency, High Token Rate

Databricks DB cuts app build to days; Lakebase runs PostgreSQL on lakehouse

Nvidia, Groq race in limestone to real‑time AI, targeting 10× lower token cost

Nvidia technique reduces LLM reasoning cost 8‑fold while preserving accuracy

xAI faces staff exodus as human errors blunt raw AI intelligence

xAI launches GLM-5 and AI-driven customer intelligence platform