Skip to main content
Close-up of a futuristic AI interface displaying GPT-5.2 performance metrics, showing it outperforming professionals on 70.9%

Editorial illustration for GPT-5.2 Outperforms Professionals on 70.9% of Tasks, OpenAI Analysis Reveals

GPT-5.2 Beats Pros on 70.9% of Tasks, OpenAI Study Shows

Analysis overhauls AI Index; GPT-5.2 beats professionals on 70.9% of tasks

Updated: 2 min read

In a landmark study that could reshape how we understand artificial intelligence's capabilities, OpenAI has released notable research challenging traditional assumptions about professional competence. The analysis, focused on GPT-5.2's performance across multiple professional domains, suggests a potential seismic shift in workplace dynamics.

The research zeroes in on a critical question: Can AI genuinely match human expertise across diverse professional tasks? By systematically evaluating performance across 44 different occupations, OpenAI has produced data that might make knowledge workers sit up and take notice.

Initial findings are striking. The study doesn't just suggest marginal improvements or incremental gains, it indicates a substantial leap in AI's ability to handle complex, well-defined professional tasks. Companies like Notion are already part of this evaluation, signaling a serious, rigorous approach to benchmarking AI capabilities.

What emerges is more than just a technical report. It's a potential preview of how artificial intelligence might fundamentally transform professional work in the coming years.

On the original GDPval evaluation, GPT-5.2 beat or tied top industry professionals on 70.9% of well-specified tasks, according to OpenAI. The company claims GPT-5.2 "outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations," with companies including Notion, Box, Shopify, Harvey, and Zoom observing "state-of-the-art long-horizon reasoning and tool-calling performance." The emphasis on economically measurable output is a philosophical shift in how the industry thinks about AI capability. Rather than asking whether a model can pass a bar exam or solve competition math problems -- achievements that generate headlines but don't necessarily translate to workplace productivity -- the new benchmarks ask whether AI can actually do jobs. Graduate-level physics problems expose the limits of today's most advanced AI models While GDPval-AA measures practical productivity, another new evaluation called CritPT reveals just how far AI systems remain from true scientific reasoning.

OpenAI's latest analysis suggests a significant milestone for AI capabilities. GPT-5.2 has demonstrated remarkable performance, outperforming professionals on nearly 71% of well-specified tasks across 44 different occupations.

Major tech companies like Notion, Box, and Shopify have already observed the system's advanced reasoning and tool-calling abilities. This isn't just theoretical progress - it's practical performance that could reshape knowledge work.

The data points to a nuanced reality: AI isn't replacing humans wholesale, but excelling in specific, well-defined tasks. Companies are witnessing modern performance that challenges traditional workforce expectations.

Still, questions remain about the depth and breadth of these capabilities. The GDPval evaluation provides a structured glimpse into AI's potential, but real-world complexity often differs from controlled assessments.

the boundary between AI and human professional work continues to blur. For industries spanning technology, creative fields, and knowledge work, GPT-5.2 represents a provocative indicator of technological progression.

Further Reading

Common Questions Answered

How did GPT-5.2 perform across professional tasks in OpenAI's evaluation?

According to OpenAI's research, GPT-5.2 beat or tied top industry professionals on 70.9% of well-specified tasks. The system demonstrated exceptional performance across 44 different occupations, showcasing advanced long-horizon reasoning and tool-calling capabilities.

Which major tech companies have observed GPT-5.2's performance?

Companies including Notion, Box, Shopify, Harvey, and Zoom have directly observed GPT-5.2's state-of-the-art performance. These organizations have noted the AI's remarkable ability to handle complex knowledge work tasks with unprecedented efficiency.

What makes GPT-5.2's performance significant for workplace dynamics?

GPT-5.2's ability to outperform professionals on nearly 71% of tasks suggests a potential transformative shift in how knowledge work is conducted. The research challenges traditional assumptions about professional competence and indicates that AI could fundamentally reshape workplace productivity and task execution.