Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 288 papers

Title	Date	Tasks	Status
Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving	Apr 10, 2024	Automated Theorem ProvingLanguage Modeling	CodeCode Available
Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry	Apr 9, 2024	Automated Theorem ProvingCPU	—Unverified
Proceedings 12th International Workshop on Theorem proving components for Educational software	Apr 4, 2024	Automated Theorem Proving	—Unverified
Multi-Task Learning with Multi-Task Optimization	Mar 24, 2024	Automated Theorem Provingimage-classification	—Unverified
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code	Mar 19, 2024	Automated Theorem ProvingCode Generation	—Unverified
Learning Guided Automated Reasoning: A Brief Survey	Mar 6, 2024	Automated Theorem ProvingLogical Reasoning	—Unverified
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving	Mar 6, 2024	Automated Theorem ProvingBenchmarking	—Unverified
A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic	Feb 28, 2024	Automated Theorem ProvingInformation Retrieval	—Unverified
REFACTOR: Learning to Extract Theorems from Proofs	Feb 26, 2024	Automated Theorem Proving	CodeCode Available
On the (In)feasibility of ML Backdoor Detection as an Hypothesis Testing Problem	Feb 26, 2024	Automated Theorem ProvingOut-of-Distribution Detection	CodeCode Available
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages	Feb 12, 2024	Automated Theorem ProvingBenchmarking	—Unverified
0-1 laws for pattern occurrences in phylogenetic trees and networks	Feb 7, 2024	10-shot image generation	—Unverified
Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors	Feb 6, 2024	Automated Theorem ProvingGame of Go	—Unverified
Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint Solving	Jan 22, 2024	Automated Theorem Proving	CodeCode Available
Graph2Tac: Online Representation Learning of Formal Math Concepts	Jan 5, 2024	AI AgentAutomated Theorem Proving	—Unverified
Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method	Dec 20, 2023	Automated Theorem ProvingData Augmentation	—Unverified
Automated Planning Techniques for Elementary Proofs in Abstract Algebra	Dec 11, 2023	Abstract AlgebraAutomated Theorem Proving	—Unverified
Large Language Models' Understanding of Math: Source Criticism and Extrapolation	Nov 12, 2023	Automated Theorem ProvingMath	—Unverified
Generative Learning of Continuous Data by Tensor Networks	Oct 31, 2023	Automated Theorem ProvingTensor Networks	—Unverified
math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories	Oct 25, 2023	Automated Theorem ProvingLanguage Modeling	—Unverified
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models	Oct 16, 2023	Automated Theorem ProvingBenchmarking	CodeCode Available
The Mathematical Game	Sep 22, 2023	Automated Theorem Proving	—Unverified
Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics	Jul 4, 2023	Automated Theorem ProvingMath	—Unverified
Theorem Proving in Dependently-Typed Higher-Order Logic -- Extended Preprint	May 24, 2023	Automated Theorem ProvingTranslation	CodeCode Available
Translating SUMO-K to Higher-Order Set Theory	May 13, 2023	Automated Theorem ProvingCommon Sense Reasoning	—Unverified

Show:10 25 50

← PrevPage 6 of 12Next →

All datasets miniF2F-test miniF2F-valid HolStep (Conditional)HOList benchmark HolStep (Unconditional)Metamath set.mm miniF2F-curriculum CompCert CoqGym

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Kimina-Prover-Preview	cumulative	80.74	—	Unverified
2	ProofAug	cumulative	66	—	Unverified
3	DeepSeek-Prover-V1.5	cumulative	63.5	—	Unverified
4	Subgoal-XL	cumulative	56.1	—	Unverified
5	DeepSeek-Prover	cumulative	52	—	Unverified
6	Lyra + GPT-4	cumulative	47.1	—	Unverified
7	LEGO-Prover ChatGPT	cumulative	47.1	—	Unverified
8	Decomposing the Enigma	cumulative	45.5	—	Unverified
9	Evariste	cumulative	41	—	Unverified
10	Evariste-7d	cumulative	40.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@64	58.6	—	Unverified
2	LEGO-Prover ChatGPT	Pass@100	57	—	Unverified
3	Lyra + GPT-4	Pass@100	52	—	Unverified
4	Evariste-7d	Pass@64	47.5	—	Unverified
5	GPT-f	Pass@64	47.3	—	Unverified
6	Evariste-1d	Pass@64	46.7	—	Unverified
7	DSP (62B Minerva informal)	Pass@100	43.9	—	Unverified
8	Lean GPT-f	Pass@8	29.3	—	Unverified
9	Lean tidy	Pass@1	16.8	—	Unverified
10	Metamath GPT-f	Pass@8	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MPNN-DagLSTM	Classification Accuracy	0.92	—	Unverified
2	FormulaNet	Classification Accuracy	0.9	—	Unverified
3	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
4	Siamese 1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified
5	Siamese 1D CNN	Classification Accuracy	0.82	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	4-hop GNN, sub-expression sharing	Percentage correct	49.95	—	Unverified
2	Tactic Dependent Loop	Percentage correct	38.88	—	Unverified
3	BoW2 (extra -ves)	Percentage correct	36.55	—	Unverified
4	Deeper Wider WaveNet	Percentage correct	32.65	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FormulaNet	Classification Accuracy	0.9	—	Unverified
2	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
3	1D CNN	Classification Accuracy	0.83	—	Unverified
4	1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@32	72.4	—	Unverified
2	GPT-f	Percentage correct	56.2	—	Unverified
3	MetaGen-IL + Holophrasm	Percentage correct	22.1	—	Unverified
4	Holophrasm	Percentage correct	14.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste-7d	Pass@64	42.5	—	Unverified
2	Evariste-1d	Pass@64	33.6	—	Unverified
3	Evariste	Pass@64	32.1	—	Unverified
4	GPT-f	Pass@64	30.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Proverbot9001	Percentage correct	19.36	—	Unverified
2	CoqGym/ASTactic	Percentage correct	4.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ASTactic	Percentage correct	12.2	—	Unverified