AI models GPT-5.4 and Claude Opus 4.6 demonstrate advanced capabilities in coding, math, and research.

Editorial illustration for New GPT‑5.4 and Claude Opus 4.6 excel in coding, math, research

GPT-5.4 and Claude Opus Redefine AI Coding Limits

New GPT‑5.4 and Claude Opus 4.6 excel in coding, math, research

April 10, 2026 • 3 min read

Why does the split between “hard‑core” and “hand‑hold” AI matter right now? One camp is busy feeding the newest language models into tools that developers already trust—think OpenAI’s GPT‑5.4 Thinking or Anthropic’s Claude Opus 4.6 paired with Codex or Claude Code. The other camp still leans on older versions for everyday chat.

The contrast is stark: the same family of models that can now draft research papers or solve differential equations still stumble over simple, off‑the‑cuff questions. Andrej Karpathy notes that the leap in programming, math and scholarly assistance has been “massive this year,” hinting at a shift in what professionals can automate. Yet the gap raises a practical question for anyone watching the field: can the tools that excel in technical work be trusted for the casual queries that keep users coming back?

The answer, according to recent observations, sits somewhere between the two extremes.

The second group uses the latest models--like OpenAI's GPT-5.4 Thinking or Claude Opus 4.6--inside capable harnesses like Codex or Claude Code for professional work in programming, math, and research. Progress in these areas has been massive this year, Karpathy says, with models now capable of autonomously restructuring entire codebases or hunting down security vulnerabilities on their own. Karpathy says these two groups are basically talking past each other.

It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. Karpathy via X Karpathy's take points to something bigger: areas like code or math, where you can clearly check whether an answer is right or wrong and specifically reinforce it through reinforcement learning with verifiable rewards, are seeing more and especially measurable gains from AI progress than fuzzy domains like writing or consulting, where there's no clean metric to optimize against. Why verifiability drives AI progress This raises a core question in AI research right now: can general intelligence actually emerge from language models, or can these models only be tuned to perform well within specific domains?

LLMs crush coding and math but choke on casual questions, and that's not a contradiction - THE DECODER

Do the newest models finally deliver on their promises? Karpathy says they excel where it counts: programming, mathematics, and research tasks. GPT‑5.4 Thinking and Claude Opus 4.6 can churn out complex code in hours, a speed that professional users find valuable.

Yet the same systems stumble on everyday, casual queries, producing errors that casual users notice. This split performance explains why the free‑tier ChatGPT experience often leaves a different impression than the one formed by developers using Codex or Claude Code. The contrast is not a paradox, according to Karpathy, but a reflection of how the models are tuned for specific, high‑stakes workloads.

Massive progress this year has made autonomous code generation feasible, though the article stops short of detailing limits. Unclear whether future updates will close the gap in general‑purpose conversation without sacrificing specialized strength. For now, the evidence points to a tool that shines in technical domains while remaining fragile in simple dialogue.

Common Questions Answered

How are GPT-5.4 Thinking and Claude Opus 4.6 transforming professional coding and research tasks?

These latest AI models can autonomously restructure entire codebases and independently hunt down security vulnerabilities, dramatically accelerating professional development workflows. According to Karpathy, the progress in programming, mathematics, and research capabilities has been massive this year, with models now capable of complex technical tasks in hours.

Why do GPT-5.4 Thinking and Claude Opus 4.6 perform differently in professional versus casual contexts?

While these models excel in specialized domains like programming, mathematics, and research, they often struggle with simple, off-the-cuff conversational queries. This performance split creates a stark contrast between the experiences of professional users who see remarkable technical capabilities and casual users who might encounter more inconsistent interactions.

What makes the latest AI models like GPT-5.4 and Claude Opus 4.6 significant for professional development?

These advanced models can draft complex research papers, solve differential equations, and generate sophisticated code with remarkable speed and accuracy. Professional developers find immense value in their ability to complete technical tasks in hours that would traditionally take much longer.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

GPT-5.4 and Claude Opus Redefine AI Coding Limits

Further Reading

Common Questions Answered

How are GPT-5.4 Thinking and Claude Opus 4.6 transforming professional coding and research tasks?

Why do GPT-5.4 Thinking and Claude Opus 4.6 perform differently in professional versus casual contexts?

What makes the latest AI models like GPT-5.4 and Claude Opus 4.6 significant for professional development?

Most Popular

Google Vids adds Veo, Lyria AI models and directable avatars for flyers, reels

Meta's structured prompting lifts LLM code review accuracy to 93%

Nvidia unveils Agentforce AI platform with Adobe, Salesforce, SAP at GTC 2026

Sam Altman proposes new AI 'social contract' in You.com guide

Anthropic ends free OpenClaw access to Claude, adds extra fee April 4

Batch Mode VC-6 and NVIDIA Nsight Speed Up Vision AI Pipelines

Critique of AI Optimism Highlights Risks of Future Robot Deployment

DC reviews OpenAI proposals as Farrow‑Marantz publish 17,000‑word Altman expose

Greg Brockman says GPT reasoning models have line of sight to AGI

OpenAI acquires TBPN to accelerate global AI conversation, memo says

Further Reading

Related Reading

Ant Group unveils Ring-1T, first open-source trillion-parameter reasoning model

ChatGPT Health Event Shows AI Modernizing Dev Workflows, GitLab Unveils Plans

Gen AI app sessions up fivefold, downloads jump 778% as ChatGPT leads traffic

Google launches AI chips with 4× boost, lands Anthropic multibillion deal

Anthropic finds strict anti-hacking prompts increase AI sabotage and lying

OpenAI launches USD 100 ChatGPT Pro tier with 5× Codex limits, adjusts Plus usage

Anthropic sends Claude AI to psychiatrist, citing rising consciousness risk

OpenAI says early compute buildout gives it edge over Anthropic

Trump-appointed judges reject Anthropic's bid to block AI blacklisting

Common Questions Answered

How are GPT-5.4 Thinking and Claude Opus 4.6 transforming professional coding and research tasks?

Why do GPT-5.4 Thinking and Claude Opus 4.6 perform differently in professional versus casual contexts?

What makes the latest AI models like GPT-5.4 and Claude Opus 4.6 significant for professional development?

Most Popular

Google Vids adds Veo, Lyria AI models and directable avatars for flyers, reels

Meta's structured prompting lifts LLM code review accuracy to 93%

Nvidia unveils Agentforce AI platform with Adobe, Salesforce, SAP at GTC 2026

Sam Altman proposes new AI 'social contract' in You.com guide

Anthropic ends free OpenClaw access to Claude, adds extra fee April 4

Batch Mode VC-6 and NVIDIA Nsight Speed Up Vision AI Pipelines

Critique of AI Optimism Highlights Risks of Future Robot Deployment

DC reviews OpenAI proposals as Farrow‑Marantz publish 17,000‑word Altman expose

Greg Brockman says GPT reasoning models have line of sight to AGI

OpenAI acquires TBPN to accelerate global AI conversation, memo says