Close-up of a futuristic AI interface displaying GPT-5.2 performance metrics, showing it outperforming professionals on 70.9%

Editorial illustration for GPT-5.2 Outperforms Professionals on 70.9% of Tasks, OpenAI Analysis Reveals

GPT-5.2 Beats Pros on 70.9% of Tasks, OpenAI Study Shows

Analysis overhauls AI Index; GPT-5.2 beats professionals on 70.9% of tasks

January 7, 2026 • Updated: January 18, 2026 • 2 min read

In a landmark study that could reshape how we understand artificial intelligence's capabilities, OpenAI has released notable research challenging traditional assumptions about professional competence. The analysis, focused on GPT-5.2's performance across multiple professional domains, suggests a potential seismic shift in workplace dynamics.

The research zeroes in on a critical question: Can AI genuinely match human expertise across diverse professional tasks? By systematically evaluating performance across 44 different occupations, OpenAI has produced data that might make knowledge workers sit up and take notice.

Initial findings are striking. The study doesn't just suggest marginal improvements or incremental gains, it indicates a substantial leap in AI's ability to handle complex, well-defined professional tasks. Companies like Notion are already part of this evaluation, signaling a serious, rigorous approach to benchmarking AI capabilities.

What emerges is more than just a technical report. It's a potential preview of how artificial intelligence might fundamentally transform professional work in the coming years.

On the original GDPval evaluation, GPT-5.2 beat or tied top industry professionals on 70.9% of well-specified tasks, according to OpenAI. The company claims GPT-5.2 "outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations," with companies including Notion, Box, Shopify, Harvey, and Zoom observing "state-of-the-art long-horizon reasoning and tool-calling performance." The emphasis on economically measurable output is a philosophical shift in how the industry thinks about AI capability. Rather than asking whether a model can pass a bar exam or solve competition math problems -- achievements that generate headlines but don't necessarily translate to workplace productivity -- the new benchmarks ask whether AI can actually do jobs. Graduate-level physics problems expose the limits of today's most advanced AI models While GDPval-AA measures practical productivity, another new evaluation called CritPT reveals just how far AI systems remain from true scientific reasoning.

Artificial Analysis overhauls its AI Intelligence Index, replacing popular benchmarks with 'real-world' tests - VentureBeat AI

OpenAI's latest analysis suggests a significant milestone for AI capabilities. GPT-5.2 has demonstrated remarkable performance, outperforming professionals on nearly 71% of well-specified tasks across 44 different occupations.

Major tech companies like Notion, Box, and Shopify have already observed the system's advanced reasoning and tool-calling abilities. This isn't just theoretical progress - it's practical performance that could reshape knowledge work.

The data points to a nuanced reality: AI isn't replacing humans wholesale, but excelling in specific, well-defined tasks. Companies are witnessing modern performance that challenges traditional workforce expectations.

Still, questions remain about the depth and breadth of these capabilities. The GDPval evaluation provides a structured glimpse into AI's potential, but real-world complexity often differs from controlled assessments.

the boundary between AI and human professional work continues to blur. For industries spanning technology, creative fields, and knowledge work, GPT-5.2 represents a provocative indicator of technological progression.

Common Questions Answered

How did GPT-5.2 perform across professional tasks in OpenAI's evaluation?

According to OpenAI's research, GPT-5.2 beat or tied top industry professionals on 70.9% of well-specified tasks. The system demonstrated exceptional performance across 44 different occupations, showcasing advanced long-horizon reasoning and tool-calling capabilities.

Which major tech companies have observed GPT-5.2's performance?

Companies including Notion, Box, Shopify, Harvey, and Zoom have directly observed GPT-5.2's state-of-the-art performance. These organizations have noted the AI's remarkable ability to handle complex knowledge work tasks with unprecedented efficiency.

What makes GPT-5.2's performance significant for workplace dynamics?

GPT-5.2's ability to outperform professionals on nearly 71% of tasks suggests a potential transformative shift in how knowledge work is conducted. The research challenges traditional assumptions about professional competence and indicates that AI could fundamentally reshape workplace productivity and task execution.

🎓

Featured Review

No Code MBA

Build AI apps without coding. Our in-depth course review.

Read Review

GPT-5.2 Beats Pros on 70.9% of Tasks, OpenAI Study Shows

Further Reading

Common Questions Answered

How did GPT-5.2 perform across professional tasks in OpenAI's evaluation?

Which major tech companies have observed GPT-5.2's performance?

What makes GPT-5.2's performance significant for workplace dynamics?

Most Popular

Google Gemini 3.1 Pro doubles reasoning performance in benchmark

Hacker Exploits Cline AI Coding Agent Vulnerability Highlighted by Researcher

OpenClaw AI agent used to deliver Trojans via fake ClawHub skills

Test Shows ‘-ai’ Trick Blocks Google AI Overviews Only on Desktop Browsers

Alibaba's Qwen 3.5 397B-A17 beats larger model via multi‑token prediction, cheaper

Anthropic's mid-tier model offers 30‑minute ChatGPT crash course, 100+ prompts

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

Google embeds Lyria, expanding AI music beyond niche platforms Suno, Udio

NVIDIA Co-Design Boosts Sarvam AI Inference, Cuts TTFT Below One Second

Rapidata aims to cut model cycles from months to days, cites data‑annotation woes

Further Reading

Related Reading

Hyperparameter Tuning Reaches 0.9617 Accuracy in 64.59 Seconds

Pharma Cautious as AI Promises Faster Drug Discovery and Smarter Trials

Google AI Advisors Let Users Probe Performance with Conversational “Why” Queries

Gen AI app sessions up fivefold, downloads jump 778% as ChatGPT leads traffic

OpenAI, a Series F San Francisco startup founded in 2015 by eight pioneers

MIT study probes memorization risk of clinical AI with de-identified data

AMD announces Ryzen AI 400 at CES, resembles AI 300 in laptops

OpenAI, Anthropic Support AI Transparency Bill as States Adopt Frameworks

OpenAI develops new voice model as AI device plans follow Jony Ive partnership

Common Questions Answered

How did GPT-5.2 perform across professional tasks in OpenAI's evaluation?

Which major tech companies have observed GPT-5.2's performance?

What makes GPT-5.2's performance significant for workplace dynamics?

Most Popular

Google Gemini 3.1 Pro doubles reasoning performance in benchmark

Hacker Exploits Cline AI Coding Agent Vulnerability Highlighted by Researcher

OpenClaw AI agent used to deliver Trojans via fake ClawHub skills

Test Shows ‘-ai’ Trick Blocks Google AI Overviews Only on Desktop Browsers

Alibaba's Qwen 3.5 397B-A17 beats larger model via multi‑token prediction, cheaper

Anthropic's mid-tier model offers 30‑minute ChatGPT crash course, 100+ prompts

Anthropic's Super Bowl LX ad omits OpenAI, ChatGPT references in AI‑focused spot

Google embeds Lyria, expanding AI music beyond niche platforms Suno, Udio

NVIDIA Co-Design Boosts Sarvam AI Inference, Cuts TTFT Below One Second

Rapidata aims to cut model cycles from months to days, cites data‑annotation woes