Skip to main content
Reporter gestures toward a monitor displaying code, AI‑written prose and bar chart, highlighting Gemini 3 Pro's lead.

Gemini 3 Pro shows clear lead in coding, matching and creative writing

2 min read

Google’s AI team just dropped a new benchmark, and it’s already buzzing around the LLM crowd. After a few weeks of side-by-side tests, Gemini 3 Pro seemed to pull ahead in three work-horse tasks, writing code, spotting patterns and cranking out creative prose. Those three line up pretty neatly with what developers, analysts and content creators actually do every day, so the gap feels tangible.

Even more eye-catching was its showing on visual-input tests, where it snagged the top spot and hinted at a wider grasp of multimodal data than many of its rivals. Other labs have been touting modest bumps lately, but Gemini 3 Pro’s numbers look enough to spark a new wave of head-to-head comparisons. The results also suggest the way we measure “agentic” coding skill might be shifting, with this model nudging past some long-standing heavyweights.

All of that leads right into what Chiang mentioned to The Verge…

Chiang told The Verge that Gemini 3 Pro holds a "clear lead" in occupational categories including coding, match, and creative writing, and its agentic coding abilities "in many cases now surpass top coding models like Claude 4.5 and GPT-5.1." It also got the top spot on visual comprehension and was the first model to surpass a ~1500 score on the platform's text leaderboard. The new model's performance, Chiang said, "illustrates that the AI arms race is being shaped by models that can reason more abstractly, generalize more consistently, and deliver dependable results across an increasingly diverse set of real-world evaluations." Alex Conway, principal software engineer at DataRobot, told The Verge that one of Gemini 3's most notable advancements was on a specific reasoning benchmark called ARC-AGI-2.

Related Topics: #AI #LLM #Gemini 3 Pro #GPT-5.1 #Claude 4.5 #ARC-AGI-2 #multimodal data #visual comprehension #coding

Gemini 3 Pro’s numbers really stand out. Chiang told The Verge the model has a “clear lead” in coding, matching and creative writing, and its agentic coding skills “in many cases now surpass top coding models like Claude 4.5 and GPT-5.1.” It also topped the visual-comprehension chart. Still, the hype, those “Holy shit” memes and long-form write-ups, doesn’t automatically mean it will dominate the market.

Users haven’t dumped other models yet, and the article points out that rivals are still “wowing” observers. I’m not sure the lead will hold; other systems could close the gap as they iterate. The report calls the advantage “for now,” which leaves the longer-term outlook fuzzy.

While the leaderboard puts Gemini 3 Pro ahead, adoption trends are still shifting. Bottom line: the data shows a measurable edge, but whether that edge survives real-world use cases remains unclear.

Common Questions Answered

What professional tasks does Gemini 3 Pro lead in according to the benchmark?

Gemini 3 Pro emerged ahead of its peers in three core professional tasks: writing code, matching patterns, and generating creative prose. These categories align with the daily workloads of developers, analysts, and content creators who rely on AI.

How does Gemini 3 Pro's agentic coding ability compare to Claude 4.5 and GPT‑5.1?

According to Google’s AI team, Gemini 3 Pro’s agentic coding abilities "in many cases now surpass" top coding models like Claude 4.5 and GPT‑5.1. This suggests it can generate and execute code more effectively than those competing models in benchmark tests.

What milestone did Gemini 3 Pro achieve on the platform's text leaderboard?

Gemini 3 Pro was the first model to exceed a score of approximately 1500 on the platform's text leaderboard, marking a significant performance breakthrough. This high score reflects its strong capabilities across text‑based tasks.

Does Gemini 3 Pro's lead in visual comprehension guarantee market dominance?

While Gemini 3 Pro claimed the top spot on visual comprehension benchmarks, the article notes that this hype does not automatically translate into market dominance. Users are still employing other models, and rivals remain competitive despite Gemini's strong performance.