Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 288 papers

Title	Date	Tasks	Status	Hype
REFACTOR: Learning to Extract Theorems from Proofs	Feb 26, 2024	Automated Theorem Proving	CodeCode Available	0
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning	Feb 23, 2024	Arithmetic ReasoningAutomated Theorem Proving	CodeCode Available	2
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data	Feb 14, 2024	Automated Theorem ProvingLanguage Modelling	CodeCode Available	1
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
0-1 laws for pattern occurrences in phylogenetic trees and networks	Feb 7, 2024	10-shot image generation	—Unverified	0
Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors	Feb 6, 2024	Automated Theorem ProvingGame of Go	—Unverified	0
Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint Solving	Jan 22, 2024	Automated Theorem Proving	CodeCode Available	0
Graph2Tac: Online Representation Learning of Formal Math Concepts	Jan 5, 2024	AI AgentAutomated Theorem Proving	—Unverified	0
Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method	Dec 20, 2023	Automated Theorem ProvingData Augmentation	—Unverified	0
Automated Planning Techniques for Elementary Proofs in Abstract Algebra	Dec 11, 2023	Abstract AlgebraAutomated Theorem Proving	—Unverified	0
Large Language Models' Understanding of Math: Source Criticism and Extrapolation	Nov 12, 2023	Automated Theorem ProvingMath	—Unverified	0
Generative Learning of Continuous Data by Tensor Networks	Oct 31, 2023	Automated Theorem ProvingTensor Networks	—Unverified	0
math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories	Oct 25, 2023	Automated Theorem ProvingLanguage Modeling	—Unverified	0
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available	0
Llemma: An Open Language Model For Mathematics	Oct 16, 2023	Arithmetic ReasoningAutomated Theorem Proving	CodeCode Available	3
An In-Context Learning Agent for Formal Theorem-Proving	Oct 6, 2023	Automated Theorem ProvingIn-Context Learning	CodeCode Available	1
LEGO-Prover: Neural Theorem Proving with Growing Libraries	Oct 1, 2023	Automated Theorem Proving	CodeCode Available	1
Lyra: Orchestrating Dual Correction in Automated Theorem Proving	Sep 27, 2023	Automated Theorem ProvingHallucination	CodeCode Available	1
The Mathematical Game	Sep 22, 2023	Automated Theorem Proving	—Unverified	0
FIMO: A Challenge Formal Dataset for Automated Theorem Proving	Sep 8, 2023	Automated Theorem Proving	CodeCode Available	1
Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics	Jul 4, 2023	Automated Theorem ProvingMath	—Unverified	0
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models	Jun 27, 2023	Automated Theorem ProvingGPU	CodeCode Available	2
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving	May 25, 2023	Automated Theorem Proving	CodeCode Available	1
Theorem Proving in Dependently-Typed Higher-Order Logic -- Extended Preprint	May 24, 2023	Automated Theorem ProvingTranslation	CodeCode Available	0
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural Representations	May 15, 2023	Automated Theorem ProvingTransfer Learning	CodeCode Available	1
Translating SUMO-K to Higher-Order Set Theory	May 13, 2023	Automated Theorem ProvingCommon Sense Reasoning	—Unverified	0
Planning as Theorem Proving with Heuristics	Mar 23, 2023	Automated Theorem Proving	—Unverified	0
Probabilistic unifying relations for modelling epistemic and aleatoric uncertainty: semantics and automated reasoning with theorem proving	Mar 16, 2023	Automated Theorem ProvingProbabilistic Programming	—Unverified	0
Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models	Mar 14, 2023	Automated Theorem ProvingDeep Learning	—Unverified	0
Lemmas: Generation, Selection, Application	Mar 10, 2023	Automated Theorem Proving	CodeCode Available	0
Proceedings 11th International Workshop on Theorem Proving Components for Educational Software	Mar 9, 2023	Automated Theorem Proving	—Unverified	0
Magnushammer: A Transformer-Based Approach to Premise Selection	Mar 8, 2023	Automated Theorem ProvingLanguage Modeling	—Unverified	0
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level Mathematics	Feb 24, 2023	Abstract AlgebraAutomated Theorem Proving	CodeCode Available	1
Anti-unification and Generalization: A Survey	Feb 1, 2023	Automated Theorem ProvingSurvey	—Unverified	0
EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry	Dec 27, 2022	Automated Theorem ProvingVisual Reasoning	—Unverified	0
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions	Dec 20, 2022	Automated Theorem ProvingCode Generation	CodeCode Available	2
Solving Quantified Modal Logic Problems by Translation to Classical Logics	Dec 19, 2022	Automated Theorem ProvingTranslation	CodeCode Available	0
Peano: Learning Formal Mathematical Reasoning	Nov 29, 2022	Automated Theorem ProvingMathematical Reasoning	CodeCode Available	1
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs	Oct 21, 2022	Automated Theorem ProvingLanguage Modeling	CodeCode Available	1
TextGraphs-16 Natural Language Premise Selection Task: Zero-Shot Premise Selection with Prompting Generative Language Models	Oct 1, 2022	Automated Theorem ProvingInformation Retrieval	—Unverified	0
Keyword-based Natural Language Premise Selection for an Automatic Mathematical Statement Proving	Oct 1, 2022	Automated Theorem ProvingInformation Retrieval	—Unverified	0
Generating Compressed Combinatory Proof Structures -- An Approach to Automated First-Order Theorem Proving	Sep 26, 2022	Automated Theorem Proving	—Unverified	0
Proceedings 38th International Conference on Logic Programming	Aug 4, 2022	Automated Theorem ProvingData Integration	—Unverified	0
CD Tools -- Condensed Detachment and Structure Generating Theorem Proving (System Description)	Jul 18, 2022	Automated Theorem Proving	—Unverified	0
Learning to Prove Trigonometric Identities	Jul 14, 2022	Automated Theorem ProvingImitation Learning	—Unverified	0
Exploring Length Generalization in Large Language Models	Jul 11, 2022	Automated Theorem ProvingIn-Context Learning	—Unverified	0
Constrained Training of Neural Networks via Theorem Proving	Jul 8, 2022	Automated Theorem ProvingCode Generation	—Unverified	0
Formal Specifications from Natural Language	Jun 4, 2022	Automated Theorem Proving	—Unverified	0
Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis	May 27, 2022	Automated Theorem ProvingProgram Synthesis	—Unverified	0
NaturalProver: Grounded Mathematical Proof Generation with Language Models	May 25, 2022	Automated Theorem ProvingLanguage Modeling	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 6Next →

All datasets miniF2F-test miniF2F-valid HolStep (Conditional)HOList benchmark HolStep (Unconditional)Metamath set.mm miniF2F-curriculum CompCert CoqGym

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Kimina-Prover-Preview	cumulative	80.74	—	Unverified
2	ProofAug	cumulative	66	—	Unverified
3	DeepSeek-Prover-V1.5	cumulative	63.5	—	Unverified
4	Subgoal-XL	cumulative	56.1	—	Unverified
5	DeepSeek-Prover	cumulative	52	—	Unverified
6	Lyra + GPT-4	cumulative	47.1	—	Unverified
7	LEGO-Prover ChatGPT	cumulative	47.1	—	Unverified
8	Decomposing the Enigma	cumulative	45.5	—	Unverified
9	Evariste	cumulative	41	—	Unverified
10	Evariste-7d	cumulative	40.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@64	58.6	—	Unverified
2	LEGO-Prover ChatGPT	Pass@100	57	—	Unverified
3	Lyra + GPT-4	Pass@100	52	—	Unverified
4	Evariste-7d	Pass@64	47.5	—	Unverified
5	GPT-f	Pass@64	47.3	—	Unverified
6	Evariste-1d	Pass@64	46.7	—	Unverified
7	DSP (62B Minerva informal)	Pass@100	43.9	—	Unverified
8	Lean GPT-f	Pass@8	29.3	—	Unverified
9	Lean tidy	Pass@1	16.8	—	Unverified
10	Metamath GPT-f	Pass@8	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MPNN-DagLSTM	Classification Accuracy	0.92	—	Unverified
2	FormulaNet	Classification Accuracy	0.9	—	Unverified
3	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
4	Siamese 1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified
5	Siamese 1D CNN	Classification Accuracy	0.82	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	4-hop GNN, sub-expression sharing	Percentage correct	49.95	—	Unverified
2	Tactic Dependent Loop	Percentage correct	38.88	—	Unverified
3	BoW2 (extra -ves)	Percentage correct	36.55	—	Unverified
4	Deeper Wider WaveNet	Percentage correct	32.65	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FormulaNet	Classification Accuracy	0.9	—	Unverified
2	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
3	1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified
4	1D CNN	Classification Accuracy	0.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@32	72.4	—	Unverified
2	GPT-f	Percentage correct	56.2	—	Unverified
3	MetaGen-IL + Holophrasm	Percentage correct	22.1	—	Unverified
4	Holophrasm	Percentage correct	14.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste-7d	Pass@64	42.5	—	Unverified
2	Evariste-1d	Pass@64	33.6	—	Unverified
3	Evariste	Pass@64	32.1	—	Unverified
4	GPT-f	Pass@64	30.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Proverbot9001	Percentage correct	19.36	—	Unverified
2	CoqGym/ASTactic	Percentage correct	4.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ASTactic	Percentage correct	12.2	—	Unverified