SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 150 of 288 papers

TitleStatusHype
Autoformalization in the Era of Large Language Models: A SurveyCode5
DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal DecompositionCode5
Lean Copilot: Large Language Models as Copilots for Theorem Proving in LeanCode5
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree SearchCode4
Lean Workbook: A large-scale Lean problem set formalized from natural language math problemsCode4
miniCTX: Neural Theorem Proving with (Long-)ContextsCode4
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN proverCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
A Survey on Deep Learning for Theorem ProvingCode3
Llemma: An Open Language Model For MathematicsCode3
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical CompetitionCode3
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement LearningCode3
An Empirical Study of Data Ability Boundary in LLMs' Math ReasoningCode2
LeanDojo: Theorem Proving with Retrieval-Augmented Language ModelsCode2
LeanAgent: Lifelong Learning for Formal Theorem ProvingCode2
Learning Formal Mathematics From Intrinsic MotivationCode2
Formal Mathematics Statement Curriculum LearningCode2
LeanExplore: A search engine for Lean 4 declarationsCode2
STP: Self-play LLM Theorem Provers with Iterative Conjecturing and ProvingCode2
Pantograph: A Machine-to-Machine Interaction Interface for Advanced Theorem Proving, High Level Reasoning, and Data Extraction in Lean 4Code2
Parsel: Algorithmic Reasoning with Language Models by Composing DecompositionsCode2
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof DataCode1
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematicsCode1
NaturalProofs: Mathematical Theorem Proving in Natural LanguageCode1
Lyra: Orchestrating Dual Correction in Automated Theorem ProvingCode1
Logical Neural NetworksCode1
MA-LoT: Multi-Agent Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem ProvingCode1
Linear algebra with transformersCode1
LEGO-Prover: Neural Theorem Proving with Growing LibrariesCode1
Logical Inferences with Comparatives and Generalized QuantifiersCode1
Measuring Systematic Generalization in Neural Proof Generation with TransformersCode1
NaturalProver: Grounded Mathematical Proof Generation with Language ModelsCode1
LeanReasoner: Boosting Complex Logical Reasoning with LeanCode1
Learning as Abduction: Trainable Natural Logic Theorem Prover for Natural Language InferenceCode1
Leanabell-Prover: Posttraining Scaling in Formal ReasoningCode1
AI Descartes: Combining Data and Theory for Derivable Scientific DiscoveryCode1
INT: An Inequality Benchmark for Evaluating Generalization in Theorem ProvingCode1
Learning Theorem Proving ComponentsCode1
Beyond Autoregression: Fast LLMs via Self-Distillation Through TimeCode1
FIMO: A Challenge Formal Dataset for Automated Theorem ProvingCode1
LangPro: Natural Language Theorem ProverCode1
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal ProofsCode1
A Deep Reinforcement Learning Approach to First-Order Logic Theorem ProvingCode1
Efficient Neural Theorem Proving via Fine-grained Proof Structure AnalysisCode1
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural RepresentationsCode1
An In-Context Learning Agent for Formal Theorem-ProvingCode1
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem ProvingCode1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement LearningCode1
CriticLean: Critic-Guided Reinforcement Learning for Mathematical FormalizationCode1
Show:102550
← PrevPage 1 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNNClassification Accuracy0.83Unverified
41D CNN-LSTMClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified