Google Gemini 3.1 Pro AI model interface on a screen, showcasing its doubled reasoning performance. [blog.google](https://blo

Editorial illustration for Google Gemini 3.1 Pro doubles reasoning performance in benchmark

Google Gemini 3.1 Pro doubles reasoning performance in...

Google Gemini 3.1 Pro doubles reasoning performance in benchmark

February 21, 2026 • 2 min read

Google’s latest Gemini 3.1 Pro arrives with a promise that feels almost too tidy: a “Deep Think Mini” that can toggle its reasoning depth at will. The model’s name suggests an incremental step, yet the company’s own numbers hint at something louder. While the tech is impressive on paper, the real test comes from how it fares on established yardsticks.

On ARC‑AGI‑2—a benchmark designed to probe a system’s capacity for novel, abstract problem‑solving—Gemini 3.1 Pro posts results that dwarf its predecessor. That jump isn’t just a modest gain; it’s a shift that could reshape how developers think about on‑demand reasoning. If the figures hold up, the model may finally bridge the gap between flexible, lightweight chat and the heavyweight analytical chops traditionally reserved for larger, more specialized systems.

The following excerpt pulls the numbers straight from Google’s published benchmarks, laying out just how much the reasoning performance has moved.

**Benchmark Performance: More Than Doubling Reasoning Over 3 Pro**

Benchmark Performance: More Than Doubling Reasoning Over 3 Pro Google's published benchmarks tell a story of dramatic improvement, particularly in areas associated with reasoning and agentic capability. On ARC-AGI-2, a benchmark that evaluates a model's ability to solve novel abstract reasoning patterns, 3.1 Pro scored 77.1% -- more than double the 31.1% achieved by Gemini 3 Pro and substantially ahead of Anthropic's Sonnet 4.6 (58.3%) and Opus 4.6 (68.8%). On Humanity's Last Exam, a rigorous academic reasoning benchmark, 3.1 Pro achieved 44.4% without tools, up from 37.5% for 3 Pro and ahead of both Claude Sonnet 4.6 (33.2%) and Opus 4.6 (40.0%).

Google Gemini 3.1 Pro first impressions: a 'Deep Think Mini' with adjustable reasoning on demand - VentureBeat AI

Google's Gemini 3.1 Pro arrives with three adjustable reasoning modes, a lightweight echo of its Deep Think system. The headline claim is clear: reasoning performance more than doubles compared to Gemini 3 Pro on the ARC‑AGI‑2 benchmark. Benchmarks show dramatic improvement in abstract reasoning and agentic capability, according to Google’s published data.

Yet the article notes that three months is a long time in AI, and competitors have not been idle. How these gains will manifest in everyday applications remains uncertain. The model’s “Deep Think Mini” label suggests a trade‑off between speed and depth, but the balance is not fully detailed.

Google’s own figures provide a snapshot, but independent verification is absent. Consequently, while the numbers are impressive on paper, the practical impact is still to be measured. The update underscores Google’s focus on adjustable cognition, but whether this approach will sustain its edge as other firms iterate is unclear.

Google Gemini 3.1 Pro doubles reasoning performance in...

Further Reading

Most Popular

Hacker Exploits Cline AI Coding Agent Vulnerability Highlighted by Researcher

OpenClaw AI agent used to deliver Trojans via fake ClawHub skills

Alibaba's Qwen 3.5 397B-A17 beats larger model via multi‑token prediction, cheaper

Anthropic's mid-tier model offers 30‑minute ChatGPT crash course, 100+ prompts

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

Google embeds Lyria, expanding AI music beyond niche platforms Suno, Udio

NVIDIA Co-Design Boosts Sarvam AI Inference, Cuts TTFT Below One Second

Rapidata aims to cut model cycles from months to days, cites data‑annotation woes

Anthropic unveils Claude Opus 4.6 with multi‑agent code and large context window

Cerebras Leads Top 5 Fast LLM APIs with Low Latency, High Token Rate

Further Reading

Related Reading

Ant Group unveils Ring-1T, first open-source trillion-parameter reasoning model

ChatGPT Health Event Shows AI Modernizing Dev Workflows, GitLab Unveils Plans

Gen AI app sessions up fivefold, downloads jump 778% as ChatGPT leads traffic

Google AI Advisors Let Users Probe Performance with Conversational “Why” Queries

Game stocks slide as Google launches AI world‑gen tool, Project Genie limits noted

NVIDIA Co-Design Boosts Sarvam AI Inference, Cuts TTFT Below One Second

Anthropic's mid-tier model offers 30‑minute ChatGPT crash course, 100+ prompts

Google unveils Gemini 3.1 Pro, hits 94.3% GPQA Diamond and coding Elo 2

Gemini 3.1 Pro released as upgraded Gemini 3 Deep Think for complex tasks

Most Popular

Hacker Exploits Cline AI Coding Agent Vulnerability Highlighted by Researcher

OpenClaw AI agent used to deliver Trojans via fake ClawHub skills

Alibaba's Qwen 3.5 397B-A17 beats larger model via multi‑token prediction, cheaper

Anthropic's mid-tier model offers 30‑minute ChatGPT crash course, 100+ prompts

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

Google embeds Lyria, expanding AI music beyond niche platforms Suno, Udio

NVIDIA Co-Design Boosts Sarvam AI Inference, Cuts TTFT Below One Second

Rapidata aims to cut model cycles from months to days, cites data‑annotation woes

Anthropic unveils Claude Opus 4.6 with multi‑agent code and large context window

Cerebras Leads Top 5 Fast LLM APIs with Low Latency, High Token Rate