Nous Research's Nomos 1 ranks second on Putnam, trailing DeepSeekMath-V2
Nous Research just dropped Nomos 1, an open‑source model that managed to clinch second place on the notoriously brutal Putnam math exam. The result is striking because the top spot went to DeepSeek’s DeepSeekMath‑V2, which notched 118 out of 120 points on the 2024 William Lowell Putnam Competition. While the Putnam has long been a benchmark for elite human problem‑solvers, it’s increasingly being used to gauge AI’s raw mathematical reasoning.
Here, an openly available system is holding its own against proprietary offerings from DeepSeek, Google and OpenAI. But the numbers raise a question most readers will want answered: how does Nomos 1 stack up against the leading mathematical AI projects from those three tech giants?
How Nomos 1 compares to mathematical AI systems from DeepSeek, Google, and OpenAI The Nomos 1 results arrive amid a flurry of advances in mathematical reasoning AI. DeepSeek's model, DeepSeekMath-V2, scored 118 out of 120 points on questions from the 2024 William Lowell Putnam Mathematical Competition, beating the top human score of 90. The model also performed at the level of gold-medal winners in the International Mathematical Olympiad. This year, Google's advanced Gemini model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions - all within the 4.5-hour competition time limit.
Nomos 1’s second‑place finish on this year’s Putnam marks a clear step forward for open‑source mathematical AI. While the competition’s top human score was 90 out of 120 and the median merely two points, the system managed to surpass most contestants, demonstrating that non‑proprietary models can now operate at near‑elite levels. Yet DeepSeekMath‑V2, which posted 118 points on the same exam’s question set, still holds the lead, suggesting a performance gap that remains unexplained.
The results arrive amid a flurry of advances from DeepSeek, Google and OpenAI, underscoring how quickly the field is evolving. Whether Nomos 1’s architecture will close that gap, or if further refinements will be needed, is still uncertain. What is evident, however, is that open‑source efforts are no longer peripheral to high‑stakes mathematical testing.
The community now has a concrete benchmark to build upon, and the next iterations will likely be judged against both the Putnam scores and the opaque metrics behind DeepSeek’s near‑perfect run.
Further Reading
- Nomos 1 Turns Putnam Into a Math Benchmark - Smol AI News
- Nous Research’s Nomos 1: Open-Source 30B Math Specialist Aiming for Putnam-Level Reasoning - LessWrong
- DeepSeekMath V2: Building a Strong Mathematical Reasoning Model via Reinforcement Learning from Evol-Instruct Feedback - arXiv
- DeepSeekMath: Pushing the Limits of Open-Source Mathematical Reasoning Models - arXiv
- Open-Source Models Are Catching Up on Hard Reasoning Benchmarks - SemiAnalysis
Common Questions Answered
What rank did Nous Research's Nomos 1 achieve on the 2024 Putnam Competition?
Nomos 1 secured second place on the 2024 William Lowell Putnam Mathematical Competition, outperforming the majority of human contestants and many proprietary AI models.
How many points did DeepSeekMath‑V2 score on the same Putnam exam?
DeepSeekMath‑V2 earned 118 out of a possible 120 points on the 2024 Putnam exam, surpassing the top human score of 90 and setting the benchmark for AI performance.
What does the article suggest about the performance gap between Nomos 1 and DeepSeekMath‑V2?
The article notes that while Nomos 1 performed impressively, a noticeable performance gap remains because DeepSeekMath‑V2 outscored it by 18 points, indicating that the open‑source model has not yet matched the proprietary leader.
How does the median human score on the Putnam compare to Nomos 1's result?
The median human score on the Putnam was only two points, far below Nomos 1's second‑place finish, which demonstrates that the open‑source model operates at a level well beyond the typical human contestant.