DeepSeek Pushes Open-Source Math AI to Olympiad Heights

The Chinese lab says its open-weight DeepSeekMath-V2 matches recent DeepMind and OpenAI results and is now freely downloadable.

MIT SMR Editors November 28, 2025

Topics

China-based AI lab DeepSeek has released a new open-weight model, DeepSeekMath‑V2, claiming it solved five of the six problems from the International Mathematical Olympiad (IMO) 2025, matching the “gold-medalist” level performance reported earlier this year by Google DeepMind and OpenAI.

The lab said the model has delivered strong results beyond the IMO, including top-tier performance at China’s national Olympiad and near-perfect scores on the 2024 undergraduate Putnam exam.

On Putnam, DeepSeekMath-V2 reportedly solved 11 out of 12 problems and scored 118 out of 120, a result the company said eclipses the top human score of 90.

DeepSeek released the model weights, or the core files needed to run DeepSeekMath-V2, under an Apache-2.0 open-source license on Hugging Face, a platform used by researchers and developers worldwide to download, run, and fine-tune large models.

Hugging Face co-founder and CEO Clement Delangue described the release as a milestone, saying, “Imagine owning the brain of one of the best mathematicians in the world for free,” and added, “As far as I know, there isn’t any chatbot or API that gives you access to an IMO 2025 gold-medalist model.”

DeepSeekMath-V2 uses a “verifier-generator” dual-engine architecture in which a proof generator drafts solutions and a separate verifier checks their correctness, sending mistakes back for revision.

The DeepSeek team said the breakthrough lies not just in “getting the right answers,” but in ensuring rigorous, step-by-step reasoning. Unlike many earlier math-specialist LLMs that optimized for final-answer correctness on benchmarks such as AIME and MATH, DeepSeekMath-V2 emphasizes formal proof structure and self-verification, enabling it to avoid errors typical of models that rely solely on pattern matching.

The announcement comes amid a wave of AI research efforts pushing the boundaries of formal reasoning. Earlier this year, DeepMind’s natural-language model Gemini Deep Think officially earned a gold-medal grade from IMO judges for solving five of six problems under competition constraints.

The public release of DeepSeekMath-V2 presents the first widely accessible “Olympiad-level” reasoning engine, potentially lowering the barrier for advanced mathematical exploration and automation.

For educators, researchers and developers, especially in parts of the world without access to proprietary models, it may offer a powerful tool for theorem proving, academic research or advanced problem solving.

But several critical questions remain. Independent peer-reviewed evaluation of the claimed performances is still awaited. Some reports of the release noted discrepancies in architecture and parameter count: while the announcement speaks of a 685 billion-parameter model, supplementary technical summaries reference a 236B mixture-of-experts architecture with only 21B active parameters.

Moreover, success on a single competition or benchmark does not guarantee broader reliability in general mathematics research. AI models may still struggle with originality, deep abstraction, or unfamiliar problem domains. As past examples show, including formal-proof systems from peers such as DeepMind’s symbolic engines, scaling from contest problems to open-ended mathematics is a hard leap.

Still, for now, DeepSeekMath-V2 stands as the most accessible AI to claim, and in part demonstrate, “human-gold-medalist-level” mathematical reasoning.

Topics

About the Author

Tags:

Deepseek DeepSeekMath-V2

Topics

Share