Illustration for: DeepseekMath‑V2 Generates and Verifies Proofs, Aiming to Pop US AI Bubble
Open Source

DeepseekMath‑V2 Generates and Verifies Proofs, Aiming to Pop US AI Bubble

3 min read

DeepSeek’s newest open‑source model, DeepSeekMath‑V2, arrives with a bold claim: it can both craft mathematical proofs and check them without leaning on any external software. The release follows a series of “headline experiments” that aim to test whether a single system can handle the full cycle of reasoning, from conjecture to verification. While many AI projects still depend on separate tools for proof generation and validation, DeepSeek is betting on the model’s internal critique loop to close that gap.

The team frames the effort as a direct response to what they see as an overinflated U.S. AI market, hoping the self‑contained approach will expose weaknesses in the broader hype. For the tougher theorems, the architecture supposedly scales its test‑time computation, suggesting a flexible path forward.

If the model truly refines its own answers, the result could reshape how researchers evaluate AI‑driven mathematics—something the headline experiments are designed to prove.

*In the headline experiments, a single DeepSeekMath‑V2 model is used for both generating proofs and verifying them, with performance coming from the model's ability to critique and refine its own solutions rather than from external math software. For harder problems, the system scales up test‑time co*

Advertisement

In the headline experiments, a single DeepSeekMath‑V2 model is used for both generating proofs and verifying them, with performance coming from the model's ability to critique and refine its own solutions rather than from external math software. For harder problems, the system scales up test‑time compute, sampling and checking many candidate proofs in parallel, to reach high confidence in a final solution. Closing the gap with US labs The release comes on the heels of similar news from OpenAI and Google Deepmind, whose unreleased models also achieved gold-medal status at the IMO, accomplishments once thought to be unreachable for LLMs.

Notably, these models reportedly succeeded through general reasoning abilities rather than targeted optimizations for math competitions. If these advances prove genuine, it suggests language models are approaching a point where they can solve complex, abstract problems, traditionally considered a uniquely human skill. Still, little is known about the specifics of these models.

An OpenAI researcher recently mentioned that an even stronger version of their math model will be released in the coming months. Deepseek's decision to publish technical details stands in stark contrast to the secrecy of OpenAI and Google. While the American giants kept their architecture under wraps, Deepseek is laying its cards on the table, demonstrating that it is keeping pace with the industry's leading labs.

This transparency also doubles as a renewed attack on the Western AI economy, a play Deepseek already executed successfully earlier this year. The strategy seems to be working: As the Economist reports, many US AI startups are now bypassing major US providers in favor of Chinese open-source models to cut costs.

Related Topics: #DeepSeekMath‑V2 #open‑source #mathematical proofs #internal critique loop #test‑time compute #US AI #OpenAI #Google Deepmind #IMO

Did DeepseekMath‑V2 truly crack the toughest contests? The company says the model earned gold‑medal‑level scores at the 2025 International Mathematical Olympiad and the 2024 Chinese Mathematical Olympiad, and posted 118 out of 120 points in the Putnam, eclipsing the best human tally of 90. Its documentation claims a single model both generates and verifies proofs, relying on internal critique rather than external math engines.

For harder questions, the system reportedly scales up test‑time computation, though the exact mechanism is left vague. The results, if reproducible, would place Deepseek in direct competition with Western labs that have pursued similar benchmarks. Yet the evidence rests on the startup’s own reporting; independent validation has not been presented.

Moreover, the claim of “gold‑medal‑level” performance hinges on how the contests were administered to an AI, a detail that remains unclear. The model’s self‑refinement approach is intriguing, but without third‑party audits the broader community cannot yet gauge the true significance of these scores.

Further Reading

Common Questions Answered

What claim does DeepSeekMath‑V2 make about proof generation and verification?

DeepSeekMath‑V2 claims it can both generate mathematical proofs and verify them internally, without relying on any external math software. The model uses an internal critique loop to refine its solutions, aiming to handle the full reasoning cycle in a single system.

How does DeepSeekMath‑V2 handle harder problems according to the headline experiments?

For more difficult questions, DeepSeekMath‑V2 scales up test‑time compute by sampling many candidate proofs in parallel and checking them simultaneously. This parallel verification increases confidence in the final solution without needing external tools.

What performance results did DeepSeekMath‑V2 achieve on major math competitions?

The company reports that DeepSeekMath‑V2 earned gold‑medal‑level scores at the 2025 International Mathematical Olympiad and the 2024 Chinese Mathematical Olympiad, and scored 118 out of 120 on the Putnam exam, surpassing the best human tally of 90.

Why is DeepSeekMath‑V2’s internal critique loop considered a differentiator from other AI math projects?

Most AI math projects rely on separate tools for proof generation and validation, but DeepSeekMath‑V2’s internal critique loop allows a single model to both create and check proofs. This integration reduces dependency on external engines and aims to close the performance gap with leading US labs.

Advertisement