Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 288 papers

Title	Date	Tasks	Status	Hype
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective	Jan 19, 2025	Automated Theorem ProvingMath	—Unverified	0
Proof Recommendation System for the HOL4 Theorem Prover	Dec 31, 2024	Automated Theorem ProvingRecommendation Systems	—Unverified	0
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving	Dec 30, 2024	Automated Theorem ProvingLanguage Modeling	—Unverified	0
Formal Mathematical Reasoning: A New Frontier in AI	Dec 20, 2024	Automated Theorem ProvingMath	—Unverified	0
Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges	Dec 16, 2024	Automated Theorem Provingscientific discovery	—Unverified	0
WithdrarXiv: A Large-Scale Dataset for Retraction Study	Dec 4, 2024	Automated Theorem ProvingClaim Verification	CodeCode Available	0
Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring	Dec 1, 2024	Automated Theorem ProvingGeometry Problem Solving	—Unverified	0
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically	Nov 4, 2024	Automated Theorem Proving	—Unverified	0
Learning Rules Explaining Interactive Theorem Proving Tactic Prediction	Nov 2, 2024	Automated Theorem ProvingInductive logic programming	CodeCode Available	0
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time	Oct 28, 2024	Automated Theorem ProvingCode Generation	CodeCode Available	1
Pantograph: A Machine-to-Machine Interaction Interface for Advanced Theorem Proving, High Level Reasoning, and Data Extraction in Lean 4	Oct 21, 2024	Automated Theorem Proving	CodeCode Available	2
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation	Oct 21, 2024	Automated Theorem ProvingContinual Pretraining	CodeCode Available	0
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems	Oct 21, 2024	Automated Theorem ProvingCPU	CodeCode Available	4
Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning	Oct 17, 2024	Automated Theorem ProvingLanguage Modeling	—Unverified	0
3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes	Oct 14, 2024	Automated Theorem ProvingDiversity	—Unverified	0
LeanAgent: Lifelong Learning for Formal Theorem Proving	Oct 8, 2024	Abstract AlgebraAutomated Theorem Proving	CodeCode Available	2
Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4	Sep 9, 2024	Abstract AlgebraAutomated Theorem Proving	CodeCode Available	0
SubgoalXL: Subgoal-based Expert Learning for Theorem Proving	Aug 20, 2024	Automated Theorem Proving	CodeCode Available	1
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search	Aug 15, 2024	Automated Theorem ProvingLanguage Modeling	CodeCode Available	4
Revealed Invariant Preference	Aug 8, 2024	Automated Theorem Proving	—Unverified	0
miniCTX: Neural Theorem Proving with (Long-)Contexts	Aug 5, 2024	Automated Theorem Proving	CodeCode Available	4
Artifical intelligence and inherent mathematical difficulty	Aug 1, 2024	Automated Theorem Proving	—Unverified	0
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover	Jul 24, 2024	Automated Theorem ProvingMath	CodeCode Available	4
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition	Jul 15, 2024	Automated Theorem Proving	CodeCode Available	3
Lean-STaR: Learning to Interleave Thinking and Proving	Jul 14, 2024	Automated Theorem ProvingLanguage Modeling	—Unverified	0
Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent	Jul 5, 2024	Automated Theorem ProvingIn-Context Learning	—Unverified	0
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts	Jul 3, 2024	Automated Theorem ProvingCode Generation	CodeCode Available	1
Learning Formal Mathematics From Intrinsic Motivation	Jun 30, 2024	Automated Theorem ProvingLanguage Modeling	CodeCode Available	2
FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving	Jun 20, 2024	Automated Theorem ProvingProgram Synthesis	CodeCode Available	0
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative Grammars	Jun 16, 2024	Automated Theorem ProvingLogical Reasoning	CodeCode Available	0
miniCodeProps: a Minimal Benchmark for Proving Code Properties	Jun 16, 2024	AI AgentAutomated Theorem Proving	—Unverified	0
Lean Workbook: A large-scale Lean problem set formalized from natural language math problems	Jun 6, 2024	Automated Theorem ProvingMath	CodeCode Available	4
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data	May 23, 2024	Automated Theorem ProvingMathematical Reasoning	—Unverified	0
Proving Theorems Recursively	May 23, 2024	Automated Theorem Proving	CodeCode Available	1
A Certified Proof Checker for Deep Neural Network Verification in Imandra	May 17, 2024	Automated Theorem ProvingLEMMA	—Unverified	0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models	May 5, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving	May 2, 2024	Automated Theorem ProvingNatural Language Inference	CodeCode Available	0
Lean Copilot: Large Language Models as Copilots for Theorem Proving in Lean	Apr 18, 2024	Automated Theorem ProvingHallucination	CodeCode Available	5
A Survey on Deep Learning for Theorem Proving	Apr 15, 2024	Automated Theorem ProvingDeep Learning	CodeCode Available	3
Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving	Apr 10, 2024	Automated Theorem ProvingLanguage Modeling	CodeCode Available	0
Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry	Apr 9, 2024	Automated Theorem ProvingCPU	—Unverified	0
Proceedings 12th International Workshop on Theorem proving components for Educational software	Apr 4, 2024	Automated Theorem Proving	—Unverified	0
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization	Mar 26, 2024	Automated Theorem ProvingGSM8K	CodeCode Available	1
Multi-Task Learning with Multi-Task Optimization	Mar 24, 2024	Automated Theorem Provingimage-classification	—Unverified	0
LeanReasoner: Boosting Complex Logical Reasoning with Lean	Mar 20, 2024	Automated Theorem ProvingLogical Reasoning	CodeCode Available	1
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code	Mar 19, 2024	Automated Theorem ProvingCode Generation	—Unverified	0
Learning Guided Automated Reasoning: A Brief Survey	Mar 6, 2024	Automated Theorem ProvingLogical Reasoning	—Unverified	0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving	Mar 6, 2024	Automated Theorem ProvingBenchmarking	—Unverified	0
A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic	Feb 28, 2024	Automated Theorem ProvingInformation Retrieval	—Unverified	0
On the (In)feasibility of ML Backdoor Detection as an Hypothesis Testing Problem	Feb 26, 2024	Automated Theorem ProvingOut-of-Distribution Detection	CodeCode Available	0

Show:10 25 50

← PrevPage 2 of 6Next →

All datasets miniF2F-test miniF2F-valid HolStep (Conditional)HOList benchmark HolStep (Unconditional)Metamath set.mm miniF2F-curriculum CompCert CoqGym

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Kimina-Prover-Preview	cumulative	80.74	—	Unverified
2	ProofAug	cumulative	66	—	Unverified
3	DeepSeek-Prover-V1.5	cumulative	63.5	—	Unverified
4	Subgoal-XL	cumulative	56.1	—	Unverified
5	DeepSeek-Prover	cumulative	52	—	Unverified
6	Lyra + GPT-4	cumulative	47.1	—	Unverified
7	LEGO-Prover ChatGPT	cumulative	47.1	—	Unverified
8	Decomposing the Enigma	cumulative	45.5	—	Unverified
9	Evariste	cumulative	41	—	Unverified
10	Evariste-7d	cumulative	40.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@64	58.6	—	Unverified
2	LEGO-Prover ChatGPT	Pass@100	57	—	Unverified
3	Lyra + GPT-4	Pass@100	52	—	Unverified
4	Evariste-7d	Pass@64	47.5	—	Unverified
5	GPT-f	Pass@64	47.3	—	Unverified
6	Evariste-1d	Pass@64	46.7	—	Unverified
7	DSP (62B Minerva informal)	Pass@100	43.9	—	Unverified
8	Lean GPT-f	Pass@8	29.3	—	Unverified
9	Lean tidy	Pass@1	16.8	—	Unverified
10	Metamath GPT-f	Pass@8	2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	MPNN-DagLSTM	Classification Accuracy	0.92	—	Unverified
2	FormulaNet	Classification Accuracy	0.9	—	Unverified
3	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
4	Siamese 1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified
5	Siamese 1D CNN	Classification Accuracy	0.82	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	4-hop GNN, sub-expression sharing	Percentage correct	49.95	—	Unverified
2	Tactic Dependent Loop	Percentage correct	38.88	—	Unverified
3	BoW2 (extra -ves)	Percentage correct	36.55	—	Unverified
4	Deeper Wider WaveNet	Percentage correct	32.65	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	FormulaNet	Classification Accuracy	0.9	—	Unverified
2	FormulaNet-basic	Classification Accuracy	0.89	—	Unverified
3	1D CNN	Classification Accuracy	0.83	—	Unverified
4	1D CNN-LSTM	Classification Accuracy	0.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste	Pass@32	72.4	—	Unverified
2	GPT-f	Percentage correct	56.2	—	Unverified
3	MetaGen-IL + Holophrasm	Percentage correct	22.1	—	Unverified
4	Holophrasm	Percentage correct	14.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Evariste-7d	Pass@64	42.5	—	Unverified
2	Evariste-1d	Pass@64	33.6	—	Unverified
3	Evariste	Pass@64	32.1	—	Unverified
4	GPT-f	Pass@64	30.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Proverbot9001	Percentage correct	19.36	—	Unverified
2	CoqGym/ASTactic	Percentage correct	4.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ASTactic	Percentage correct	12.2	—	Unverified