Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 288 papers

Title	Date	Tasks	Status	Hype
AI Descartes: Combining Data and Theory for Derivable Scientific Discovery	Sep 3, 2021	Automated Theorem ProvingBIG-bench Machine Learning	CodeCode Available	1
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics	Aug 31, 2021	Automated Theorem Proving	CodeCode Available	1
ProoFVer: Natural Logic Theorem Proving for Fact Verification	Aug 25, 2021	Automated Theorem Provingcounterfactual	CodeCode Available	1
Learning Theorem Proving Components	Jul 21, 2021	Automated Theorem ProvingGraph Neural Network	CodeCode Available	1
NaturalProofs: Mathematical Theorem Proving in Natural Language	Mar 24, 2021	Automated Theorem ProvingDomain Generalization	CodeCode Available	1
Proof Artifact Co-training for Theorem Proving with Language Models	Feb 11, 2021	Automated Theorem ProvingImitation Learning	CodeCode Available	1
Learning as Abduction: Trainable Natural Logic Theorem Prover for Natural Language Inference	Oct 29, 2020	Automated Theorem ProvingNatural Language Inference	CodeCode Available	1
Measuring Systematic Generalization in Neural Proof Generation with Transformers	Sep 30, 2020	Automated Theorem ProvingLogical Reasoning	CodeCode Available	1
INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving	Jul 6, 2020	Automated Theorem Proving	CodeCode Available	1
Logical Neural Networks	Jun 23, 2020	Automated Theorem ProvingLogical Reasoning	CodeCode Available	1
Logical Inferences with Comparatives and Generalized Quantifiers	May 16, 2020	Automated Theorem ProvingNatural Language Inference	CodeCode Available	1
Prolog Technology Reinforcement Learning Prover	Apr 15, 2020	Automated Theorem Provingreinforcement-learning	CodeCode Available	1
Learning to Prove Theorems by Learning to Generate Theorems	Feb 17, 2020	Automated Theorem Proving	CodeCode Available	1
A Deep Reinforcement Learning Approach to First-Order Logic Theorem Proving	Nov 5, 2019	Automated Theorem ProvingDeep Reinforcement Learning	CodeCode Available	1
LangPro: Natural Language Theorem Prover	Aug 30, 2017	Automated Theorem ProvingNatural Language Inference	CodeCode Available	1
Prover Agent: An Agent-based Framework for Formal Mathematical Proofs	Jun 24, 2025	AI AgentAutomated Theorem Proving	—Unverified	0
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving	Jun 20, 2025	Automated Theorem ProvingDiversity	—Unverified	0
MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?	Jun 6, 2025	Automated Theorem ProvingVisual Reasoning	—Unverified	0
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening	Jun 3, 2025	Automated Theorem Proving	—Unverified	0
Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations	May 30, 2025	Automated Theorem ProvingNatural Language Inference	—Unverified	0
ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction	May 30, 2025	Automated Theorem Proving	—Unverified	0
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation	May 28, 2025	Automated Theorem ProvingRetrieval	—Unverified	0
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions	May 24, 2025	Automated Theorem ProvingMath	CodeCode Available	0
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement	May 21, 2025	Automated Theorem ProvingMathematical Proofs	—Unverified	0
MIRB: Mathematical Information Retrieval Benchmark	May 21, 2025	Automated Theorem ProvingInformation Retrieval	CodeCode Available	0
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data Generation	May 17, 2025	Automated Theorem ProvingSynthetic Data Generation	CodeCode Available	0
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation	May 16, 2025	Automated Theorem Proving	—Unverified	0
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning	May 9, 2025	Automated Theorem Proving	—Unverified	0
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving	May 7, 2025	Automated Theorem Proving	—Unverified	0
Proceedings The 13th International Workshop on Theorem proving components for Educational software	May 7, 2025	Automated Theorem Proving	—Unverified	0
The Limits of AI Explainability: An Algorithmic Information Theory Approach	Apr 29, 2025	Automated Theorem Proving	—Unverified	0
Hua-Chen New Theory of Economic Optimization	Apr 27, 2025	Automated Theorem ProvingSurvey	—Unverified	0
APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries	Apr 27, 2025	Automated Theorem ProvingBug fixing	—Unverified	0
Hierarchical Attention Generates Better Proofs	Apr 27, 2025	Automated Theorem ProvingMathematical Proofs	CodeCode Available	0
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification	Apr 23, 2025	Automated Theorem Proving	—Unverified	0
Reasoning Models Can Be Effective Without Thinking	Apr 14, 2025	Automated Theorem ProvingMathematical Problem-Solving	—Unverified	0
Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection	Apr 13, 2025	Answer SelectionAutomated Theorem Proving	—Unverified	0
Reasoning Under Threat: Symbolic and Neural Techniques for Cybersecurity Verification	Mar 27, 2025	Automated Theorem ProvingFormal Logic	—Unverified	0
A Survey on Mathematical Reasoning and Optimization with Large Language Models	Mar 22, 2025	Automated Theorem ProvingHeuristic Search	CodeCode Available	0
Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview	Mar 13, 2025	Automated Theorem Provingsoftware testing	—Unverified	0
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving	Mar 12, 2025	Automated Theorem ProvingReinforcement Learning (RL)	—Unverified	0
Efficient Neural Clause-Selection Reinforcement	Mar 10, 2025	Automated Theorem ProvingCPU	—Unverified	0
Faithful Logic Embeddings in HOL -- Deep and Shallow	Feb 26, 2025	AllAutomated Theorem Proving	—Unverified	0
Quantum Machine Learning in Precision Medicine and Drug Discovery -- A Game Changer for Tailored Treatments?	Feb 25, 2025	Automated Theorem ProvingDrug Discovery	—Unverified	0
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction	Feb 25, 2025	Automated Theorem ProvingMathematical Reasoning	—Unverified	0
A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation	Feb 25, 2025	Automated Theorem ProvingLanguage Modeling	—Unverified	0
Activation Steering in Neural Theorem Provers	Feb 21, 2025	Automated Theorem Proving	—Unverified	0
Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs	Feb 16, 2025	Automated Theorem ProvingMathematical Proofs	—Unverified	0
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation	Feb 8, 2025	Automated Theorem ProvingCode Generation	—Unverified	0
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving	Feb 5, 2025	Automated Theorem Proving	—Unverified	0

Show:10 25 50

← PrevPage 2 of 6Next →

All datasets miniF2F-test miniF2F-valid HolStep (Conditional)HOList benchmark HolStep (Unconditional)Metamath set.mm miniF2F-curriculum CompCert CoqGym

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Kimina-Prover-Preview	cumulative	80.74	—	Unverified
2	ProofAug	cumulative	66	—	Unverified
3	DeepSeek-Prover-V1.5	cumulative	63.5	—	Unverified
4	Subgoal-XL	cumulative	56.1	—	Unverified
5	DeepSeek-Prover	cumulative	52	—	Unverified
6	Lyra + GPT-4	cumulative	47.1	—	Unverified
7	LEGO-Prover ChatGPT	cumulative	47.1	—	Unverified
8	Decomposing the Enigma	cumulative	45.5	—	Unverified
9	Evariste	cumulative	41	—	Unverified
10	Evariste-7d	cumulative	40.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@64	58.6	—	Unverified
2	LEGO-Prover ChatGPT	Pass@100	57	—	Unverified
3	Lyra + GPT-4	Pass@100	52	—	Unverified
4	Evariste-7d	Pass@64	47.5	—	Unverified
5	GPT-f	Pass@64	47.3	—	Unverified
6	Evariste-1d	Pass@64	46.7	—	Unverified
7	DSP (62B Minerva informal)	Pass@100	43.9	—	Unverified
8	Lean GPT-f	Pass@8	29.3	—	Unverified
9	Lean tidy	Pass@1	16.8	—	Unverified
10	Metamath GPT-f	Pass@8	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MPNN-DagLSTM	Classification Accuracy	0.92	—	Unverified
2	FormulaNet	Classification Accuracy	0.9	—	Unverified
3	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
4	Siamese 1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified
5	Siamese 1D CNN	Classification Accuracy	0.82	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	4-hop GNN, sub-expression sharing	Percentage correct	49.95	—	Unverified
2	Tactic Dependent Loop	Percentage correct	38.88	—	Unverified
3	BoW2 (extra -ves)	Percentage correct	36.55	—	Unverified
4	Deeper Wider WaveNet	Percentage correct	32.65	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FormulaNet	Classification Accuracy	0.9	—	Unverified
2	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
3	1D CNN	Classification Accuracy	0.83	—	Unverified
4	1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@32	72.4	—	Unverified
2	GPT-f	Percentage correct	56.2	—	Unverified
3	MetaGen-IL + Holophrasm	Percentage correct	22.1	—	Unverified
4	Holophrasm	Percentage correct	14.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste-7d	Pass@64	42.5	—	Unverified
2	Evariste-1d	Pass@64	33.6	—	Unverified
3	Evariste	Pass@64	32.1	—	Unverified
4	GPT-f	Pass@64	30.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Proverbot9001	Percentage correct	19.36	—	Unverified
2	CoqGym/ASTactic	Percentage correct	4.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ASTactic	Percentage correct	12.2	—	Unverified