DeepSeekMath‑V2 Wins Gold at IMO 2025, Tops China Math Olympiad
When I first saw the headline about an AI winning gold at the International Mathematical Olympiad 2025, I thought it was a one-off stunt. Turns out DeepSeekMath-V2, the newest model from DeepSeek, has been popping up in places I didn’t expect. It not only swept the IMO roster, but it also took the top spot in the China Mathematical Olympiad - a test most Chinese teachers call the toughest pre-university exam.
And then there’s the Putnam 2024 results: the model posted scores that are almost perfect, a contest that usually separates the very best undergrads from the rest. Those three achievements span high-school, national, and college levels, which makes it hard to argue the system is just a narrow trick. Maybe the model is learning something deeper about math, or perhaps the problems are becoming more amenable to pattern-based approaches - it’s still unclear.
Still, seeing a single AI perform so well across such different challenges feels… unsettling, and worth keeping an eye on.
Besides the IMO 2025 competition, DeepSeekMath-V2 also achieved top-tier performance on China's toughest national competition, the China Mathematical Olympiad (CMO), and posted near-perfect results on the undergraduate Putnam exam. "On Putnam 2024, the preeminent undergraduate mathematics competition, our model solved 11 of 12 problems completely and the remaining problem with minor errors, scoring 118/120 and surpassing the highest human score of 90," stated DeepSeek. DeepSeek argues that recent AI models excel at getting the right answers (in math benchmarks like AIME and HMMT) but often lack sound reasoning. "Many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers, making final answer rewards inapplicable." To address this, DeepSeek emphasises the need for models that can judge and refine their own reasoning.
DeepSeekMath-V2 appears to have taken on some of the toughest Olympiad problems. The lab’s report lists gold-level scores at IMO 2025 - five out of six questions solved - and mentions top-tier results on the China Mathematical Olympiad plus near-perfect marks on the 2024 Putnam. Those numbers hint at a jump in theorem-proving for an open-weight model, but the paper is silent on how the tests were run and gives no breakdown of the one IMO problem the model missed.
Clement Delangue of Hugging Face called the idea of “owning the brain of one of the best mathematicians…for free” exciting, yet he offers no technical evidence. The write-up also stops short of putting the system side-by-side with human contestants beyond the headline figures. So, while the scores look impressive, it’s still unclear whether the model will handle a wider range of mathematical reasoning.
We’ll probably need independent benchmarks to see if DeepSeekMath-V2’s performance is a lasting step forward or just a narrow win on a few contests.
Common Questions Answered
What gold‑level achievement did DeepSeekMath‑V2 obtain at IMO 2025?
DeepSeekMath‑V2 earned a gold medal at the International Mathematical Olympiad 2025 by correctly solving five of the six competition problems. This performance placed the model among the top human participants and demonstrated its advanced theorem‑proving capabilities.
How did DeepSeekMath‑V2 perform on the China Mathematical Olympiad (CMO)?
The model topped the China Mathematical Olympiad, which is widely regarded as the nation’s toughest pre‑university mathematics contest. Its victory indicates that DeepSeekMath‑V2 can handle the most challenging national-level problems in addition to international ones.
What score did DeepSeekMath‑V2 achieve on the Putnam 2024 exam and how does it compare to the highest human score?
DeepSeekMath‑V2 scored 118 out of 120 on the 2024 Putnam exam, solving 11 of the 12 problems completely and making only minor errors on the last one. This result surpasses the highest recorded human score of 90, highlighting a substantial performance gap between the model and top undergraduate competitors.
What limitations or missing details are noted about DeepSeekMath‑V2’s reported results?
The announcement does not provide details on the evaluation methodology used for the IMO, CMO, or Putnam results, leaving the exact testing conditions unclear. Additionally, the model’s performance on the single unsolved IMO problem was not disclosed, which limits a full assessment of its capabilities.