LEAP: How LLMs Solved Every Putnam 2025 Problem

Google has published an interesting paper on LEAP, a system designed for automated theorem proving. It allows language models to construct formal, machine-verifiable proofs in the Lean language.

A proof in natural language is difficult to verify automatically—it almost always contains logical gaps. A formal proof is written in a machine language and checked by a compiler, which guarantees correctness, but writing it is much more difficult.

In this field, specialized models trained specifically for Lean typically take the lead. LEAP, operating as an agentic framework, uses general models by breaking the problem into parts and recursively correcting errors based on compiler feedback.

Success at the Putnam 2025 Competition

The project's main research victory is the Putnam 2025 competition, an annual mathematics competition for college students in the US.

LEAP formally solved all 12 problems, whereas neither Gemini 3.1 Pro on its own nor the open-source specialized Goedel-Prover-V2 solved a single one. The authors evaluated the closed-source Aristotle system, which achieved a gold medal-level result at the IMO 2025 Math Olympiad, at 9 out of 12 solved problems in their own tests.

IMO-LeanProofBench and Other Achievements

The paper also introduces IMO-LeanProofBench, a dataset of 60 formalized Olympiad-level problems in Lean that require highly unconventional deductions and structurally complex proofs. On this benchmark, LEAP achieves an average of around 70% across the basic and advanced sets, compared to approximately 48% for Aristotle.

An additional achievement: LEAP formally verified the auxiliary part of a combinatorial problem tracing back to mathematician Donald Knuth, generating over 5000 lines of code in Lean 4.

The Irony of Closed Source

It is worth noting a certain irony: in the article, the authors criticize closed competitor systems (Axiom3, Numina, Aristotle) because they are unavailable for scientific verification. However, the code for LEAP itself is also unpublished. Opening up the final proofs partially mitigates this (they can be rechecked by the Lean compiler), but full reproducibility of Google's project is not yet possible.

LEAP: How LLMs Solved Every Putnam 2025 Problem

Success at the Putnam 2025 Competition

IMO-LeanProofBench and Other Achievements

The Irony of Closed Source

Google and Nvidia Shift Orders to Intel Amid TSMC Capacity Shortages

Claude Fable 5: The Next Generation of Frontier Intelligence

Related Articles

Claude Opus 5: Near-Fable Intelligence at Half the Price

Google Ships Gemini 3.6 Flash at a Lower Price Than 3.5

GPT-5.6 Ships as Three Models That Differ Only in Price

Continue Your AI Journey