Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–175 of 288 papers

Title	Date	Tasks	Status
Semantic Parsing of Mathematics by Context-based Learning from Aligned Corpora and Theorem Proving	Nov 29, 2016	Automated Theorem ProvingSemantic Parsing	—Unverified
Simple Dataset for Proof Method Recommendation in Isabelle/HOL (Dataset Description)	Apr 21, 2020	Automated Theorem ProvingBIG-bench Machine Learning	—Unverified
Social Network Processes in the Isabelle and Coq Theorem Proving Communities	Sep 22, 2016	Automated Theorem Proving	—Unverified
TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning	Feb 19, 2021	Automated Theorem ProvingDeep Reinforcement Learning	—Unverified
Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors	Feb 6, 2024	Automated Theorem ProvingGame of Go	—Unverified
TextGraphs-16 Natural Language Premise Selection Task: Zero-Shot Premise Selection with Prompting Generative Language Models	Oct 1, 2022	Automated Theorem ProvingInformation Retrieval	—Unverified
The Horn Non-Clausal Class and its Polynomiality	Aug 31, 2021	Automated Theorem Proving	—Unverified
The Limits of AI Explainability: An Algorithmic Information Theory Approach	Apr 29, 2025	Automated Theorem Proving	—Unverified
The Mathematical Game	Sep 22, 2023	Automated Theorem Proving	—Unverified
Theorem Proving Based on Semantics of DNA Strand Graph	Feb 15, 2017	Automated Theorem Proving	—Unverified
The Role of Entropy in Guiding a Connection Prover	May 31, 2021	Automated Theorem ProvingDecision Making	—Unverified
Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers	May 22, 2022	Automated Theorem Proving	—Unverified
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving	Jun 20, 2025	Automated Theorem ProvingDiversity	—Unverified
Towards a Geometry Automated Provers Competition	Feb 28, 2020	Automated Theorem ProvingCPU	—Unverified
Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent	Jul 5, 2024	Automated Theorem ProvingIn-Context Learning	—Unverified
Towards Concise, Machine-discovered Proofs of Gödel's Two Incompleteness Theorems	May 6, 2020	Automated Theorem ProvingVocal Bursts Valence Prediction	—Unverified
Towards Evolutionary Theorem Proving for Isabelle/HOL	Apr 17, 2019	Automated Theorem Proving	—Unverified
Towards Formal Fault Tree Analysis using Theorem Proving	May 8, 2015	Automated Theorem Proving	—Unverified
Towards Machine Learning Induction	Dec 4, 2018	Automated Theorem ProvingBIG-bench Machine Learning	—Unverified
Towards Neural Theorem Proving at Scale	Jul 21, 2018	Automated Theorem ProvingRepresentation Learning	—Unverified
Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges	Dec 16, 2024	Automated Theorem Provingscientific discovery	—Unverified
Towards United Reasoning for Automatic Induction in Isabelle/HOL	May 25, 2020	Automated Theorem Proving	—Unverified
Training a First-Order Theorem Prover from Synthetic Data	Mar 5, 2021	Automated Theorem ProvingBIG-bench Machine Learning	—Unverified
Translating SUMO-K to Higher-Order Set Theory	May 13, 2023	Automated Theorem ProvingCommon Sense Reasoning	—Unverified
Vehicle: Interfacing Neural Network Verifiers with Interactive Theorem Provers	Feb 10, 2022	Automated Theorem Proving	—Unverified

Show:10 25 50

← PrevPage 7 of 12Next →

All datasets miniF2F-test miniF2F-valid HolStep (Conditional)HOList benchmark HolStep (Unconditional)Metamath set.mm miniF2F-curriculum CompCert CoqGym

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Kimina-Prover-Preview	cumulative	80.74	—	Unverified
2	ProofAug	cumulative	66	—	Unverified
3	DeepSeek-Prover-V1.5	cumulative	63.5	—	Unverified
4	Subgoal-XL	cumulative	56.1	—	Unverified
5	DeepSeek-Prover	cumulative	52	—	Unverified
6	Lyra + GPT-4	cumulative	47.1	—	Unverified
7	LEGO-Prover ChatGPT	cumulative	47.1	—	Unverified
8	Decomposing the Enigma	cumulative	45.5	—	Unverified
9	Evariste	cumulative	41	—	Unverified
10	Evariste-7d	cumulative	40.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@64	58.6	—	Unverified
2	LEGO-Prover ChatGPT	Pass@100	57	—	Unverified
3	Lyra + GPT-4	Pass@100	52	—	Unverified
4	Evariste-7d	Pass@64	47.5	—	Unverified
5	GPT-f	Pass@64	47.3	—	Unverified
6	Evariste-1d	Pass@64	46.7	—	Unverified
7	DSP (62B Minerva informal)	Pass@100	43.9	—	Unverified
8	Lean GPT-f	Pass@8	29.3	—	Unverified
9	Lean tidy	Pass@1	16.8	—	Unverified
10	Metamath GPT-f	Pass@8	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MPNN-DagLSTM	Classification Accuracy	0.92	—	Unverified
2	FormulaNet	Classification Accuracy	0.9	—	Unverified
3	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
4	Siamese 1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified
5	Siamese 1D CNN	Classification Accuracy	0.82	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	4-hop GNN, sub-expression sharing	Percentage correct	49.95	—	Unverified
2	Tactic Dependent Loop	Percentage correct	38.88	—	Unverified
3	BoW2 (extra -ves)	Percentage correct	36.55	—	Unverified
4	Deeper Wider WaveNet	Percentage correct	32.65	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FormulaNet	Classification Accuracy	0.9	—	Unverified
2	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
3	1D CNN	Classification Accuracy	0.83	—	Unverified
4	1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@32	72.4	—	Unverified
2	GPT-f	Percentage correct	56.2	—	Unverified
3	MetaGen-IL + Holophrasm	Percentage correct	22.1	—	Unverified
4	Holophrasm	Percentage correct	14.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste-7d	Pass@64	42.5	—	Unverified
2	Evariste-1d	Pass@64	33.6	—	Unverified
3	Evariste	Pass@64	32.1	—	Unverified
4	GPT-f	Pass@64	30.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Proverbot9001	Percentage correct	19.36	—	Unverified
2	CoqGym/ASTactic	Percentage correct	4.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ASTactic	Percentage correct	12.2	—	Unverified