DeepSeekMath‑V2 Wins Gold at IMO 2025, Tops China Math Olympiad
Why does a language model’s performance in high‑school contests matter? Because the same system that clinched gold at the International Mathematical Olympiad this year is now showing up on the other side of the competition spectrum. DeepSeekMath‑V2, the latest offering from DeepSeek, didn’t just dominate the IMO 2025 roster; it also topped the China Mathematical Olympiad, a test widely regarded as the nation’s toughest pre‑university challenge.
While the IMO win grabbed headlines, the model’s results on the Putnam exam—an undergraduate contest that pits the brightest college students against each other—add another layer of relevance. Near‑perfect scores on Putnam 2024 suggest the system can handle problems that demand not only clever insight but also a depth of mathematical maturity. Here’s the thing: when a single AI can excel across these three very different arenas, the claim that it’s merely a specialized tool starts to look shaky.
The following quote lays out the specifics of those achievements.
Besides the IMO 2025 competition, DeepSeekMath-V2 also achieved top-tier performance on China's toughest national competition, the China Mathematical Olympiad (CMO), and posted near-perfect results on the undergraduate Putnam exam. "On Putnam 2024, the preeminent undergraduate mathematics competition, our model solved 11 of 12 problems completely and the remaining problem with minor errors, scoring 118/120 and surpassing the highest human score of 90," stated DeepSeek. DeepSeek argues that recent AI models excel at getting the right answers (in math benchmarks like AIME and HMMT) but often lack sound reasoning. "Many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable." To address this, DeepSeek emphasises the need for models that can judge and refine their own reasoning.
Did DeepSeekMath‑V2 truly crack the hardest Olympiad problems? The lab reports gold‑level scores at IMO 2025, solving five of six questions, and claims top‑tier results on the China Mathematical Olympiad and near‑perfect marks on the 2024 Putnam exam. Such figures suggest a leap in theorem‑proving ability for an open‑weight model.
Yet the announcement provides no detail on the evaluation methodology, nor on how the model performed on the single unsolved IMO problem. Clement Delangue of Hugging Face praised the prospect of “owning the brain of one of the best mathematicians…for free,” a comment that underscores excitement but offers no technical validation. Moreover, the report stops short of comparing the system to human competitors beyond the headline scores.
Consequently, while the numbers are impressive, it remains unclear whether the model’s success will translate to broader mathematical reasoning tasks. Future independent testing will be needed to confirm whether DeepSeekMath‑V2’s achievements represent a sustainable advance or a narrow victory on specific contests.
Further Reading
- How Does DeepSeekMath-V2 Achieve Self-Verifying Mathematical Reasoning - Dev.to
- DeepSeek has launched the DeepSeekMath-V2 model, focusing on self-verification training framework - Futunn News
- Advanced version of Gemini with Deep Think officially achieves gold medal standard at the International Mathematical Olympiad - Google DeepMind Blog
- Winning Gold at IMO 2025 with a Model-Agnostic Approach - arXiv
Common Questions Answered
What gold‑level achievement did DeepSeekMath‑V2 obtain at IMO 2025?
DeepSeekMath‑V2 earned a gold medal at the International Mathematical Olympiad 2025 by correctly solving five of the six competition problems. This performance placed the model among the top human participants and demonstrated its advanced theorem‑proving capabilities.
How did DeepSeekMath‑V2 perform on the China Mathematical Olympiad (CMO)?
The model topped the China Mathematical Olympiad, which is widely regarded as the nation’s toughest pre‑university mathematics contest. Its victory indicates that DeepSeekMath‑V2 can handle the most challenging national-level problems in addition to international ones.
What score did DeepSeekMath‑V2 achieve on the Putnam 2024 exam and how does it compare to the highest human score?
DeepSeekMath‑V2 scored 118 out of 120 on the 2024 Putnam exam, solving 11 of the 12 problems completely and making only minor errors on the last one. This result surpasses the highest recorded human score of 90, highlighting a substantial performance gap between the model and top undergraduate competitors.
What limitations or missing details are noted about DeepSeekMath‑V2’s reported results?
The announcement does not provide details on the evaluation methodology used for the IMO, CMO, or Putnam results, leaving the exact testing conditions unclear. Additionally, the model’s performance on the single unsolved IMO problem was not disclosed, which limits a full assessment of its capabilities.