Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 288 papers

Title	Date	Tasks	Status	Hype
Autoformalization in the Era of Large Language Models: A Survey	May 29, 2025	Automated Theorem Proving	CodeCode Available	5
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition	Apr 30, 2025	Automated Theorem ProvingLarge Language Model	CodeCode Available	5
Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean	Apr 18, 2024	Automated Theorem ProvingHallucination	CodeCode Available	5
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover	Jul 24, 2024	Automated Theorem ProvingMath	CodeCode Available	4
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search	Aug 15, 2024	Automated Theorem ProvingLanguage Modeling	CodeCode Available	4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems	Oct 21, 2024	Automated Theorem ProvingCPU	CodeCode Available	4
Lean Workbook: A large-scale Lean problem set formalized from natural language math problems	Jun 6, 2024	Automated Theorem ProvingMath	CodeCode Available	4
miniCTX: Neural Theorem Proving with (Long-)Contexts	Aug 5, 2024	Automated Theorem Proving	CodeCode Available	4
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning	Apr 15, 2025	Automated Theorem ProvingLarge Language Model	CodeCode Available	3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving	Feb 11, 2025	Automated Theorem ProvingLarge Language Model	CodeCode Available	3
A Survey on Deep Learning for Theorem Proving	Apr 15, 2024	Automated Theorem ProvingDeep Learning	CodeCode Available	3
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition	Jul 15, 2024	Automated Theorem Proving	CodeCode Available	3
Llemma: An Open Language Model For Mathematics	Oct 16, 2023	Arithmetic ReasoningAutomated Theorem Proving	CodeCode Available	3
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning	Feb 23, 2024	Arithmetic ReasoningAutomated Theorem Proving	CodeCode Available	2
STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving	Jan 31, 2025	Automated Theorem Proving	CodeCode Available	2
Formal Mathematics Statement Curriculum Learning	Feb 3, 2022	Automated Theorem ProvingLanguage Modeling	CodeCode Available	2
Pantograph: A Machine-to-Machine Interaction Interface for Advanced Theorem Proving, High Level Reasoning, and Data Extraction in Lean 4	Oct 21, 2024	Automated Theorem Proving	CodeCode Available	2
Learning Formal Mathematics From Intrinsic Motivation	Jun 30, 2024	Automated Theorem ProvingLanguage Modeling	CodeCode Available	2
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions	Dec 20, 2022	Automated Theorem ProvingCode Generation	CodeCode Available	2
LeanExplore: A search engine for Lean 4 declarations	Jun 4, 2025	Automated Theorem Proving	CodeCode Available	2
LeanAgent: Lifelong Learning for Formal Theorem Proving	Oct 8, 2024	Abstract AlgebraAutomated Theorem Proving	CodeCode Available	2
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models	Jun 27, 2023	Automated Theorem ProvingGPU	CodeCode Available	2
AI Descartes: Combining Data and Theory for Derivable Scientific Discovery	Sep 3, 2021	Automated Theorem ProvingBIG-bench Machine Learning	CodeCode Available	1
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities	May 19, 2025	Automated Theorem ProvingBenchmarking	CodeCode Available	1
An In-Context Learning Agent for Formal Theorem-Proving	Oct 6, 2023	Automated Theorem ProvingIn-Context Learning	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 12Next →

All datasets miniF2F-test miniF2F-valid HolStep (Conditional)HOList benchmark HolStep (Unconditional)Metamath set.mm miniF2F-curriculum CompCert CoqGym

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Kimina-Prover-Preview	cumulative	80.74	—	Unverified
2	ProofAug	cumulative	66	—	Unverified
3	DeepSeek-Prover-V1.5	cumulative	63.5	—	Unverified
4	Subgoal-XL	cumulative	56.1	—	Unverified
5	DeepSeek-Prover	cumulative	52	—	Unverified
6	Lyra + GPT-4	cumulative	47.1	—	Unverified
7	LEGO-Prover ChatGPT	cumulative	47.1	—	Unverified
8	Decomposing the Enigma	cumulative	45.5	—	Unverified
9	Evariste	cumulative	41	—	Unverified
10	Evariste-7d	cumulative	40.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@64	58.6	—	Unverified
2	LEGO-Prover ChatGPT	Pass@100	57	—	Unverified
3	Lyra + GPT-4	Pass@100	52	—	Unverified
4	Evariste-7d	Pass@64	47.5	—	Unverified
5	GPT-f	Pass@64	47.3	—	Unverified
6	Evariste-1d	Pass@64	46.7	—	Unverified
7	DSP (62B Minerva informal)	Pass@100	43.9	—	Unverified
8	Lean GPT-f	Pass@8	29.3	—	Unverified
9	Lean tidy	Pass@1	16.8	—	Unverified
10	Metamath GPT-f	Pass@8	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MPNN-DagLSTM	Classification Accuracy	0.92	—	Unverified
2	FormulaNet	Classification Accuracy	0.9	—	Unverified
3	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
4	Siamese 1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified
5	Siamese 1D CNN	Classification Accuracy	0.82	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	4-hop GNN, sub-expression sharing	Percentage correct	49.95	—	Unverified
2	Tactic Dependent Loop	Percentage correct	38.88	—	Unverified
3	BoW2 (extra -ves)	Percentage correct	36.55	—	Unverified
4	Deeper Wider WaveNet	Percentage correct	32.65	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FormulaNet	Classification Accuracy	0.9	—	Unverified
2	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
3	1D CNN	Classification Accuracy	0.83	—	Unverified
4	1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@32	72.4	—	Unverified
2	GPT-f	Percentage correct	56.2	—	Unverified
3	MetaGen-IL + Holophrasm	Percentage correct	22.1	—	Unverified
4	Holophrasm	Percentage correct	14.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste-7d	Pass@64	42.5	—	Unverified
2	Evariste-1d	Pass@64	33.6	—	Unverified
3	Evariste	Pass@64	32.1	—	Unverified
4	GPT-f	Pass@64	30.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Proverbot9001	Percentage correct	19.36	—	Unverified
2	CoqGym/ASTactic	Percentage correct	4.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ASTactic	Percentage correct	12.2	—	Unverified