SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 51100 of 288 papers

TitleStatusHype
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective0
Proof Recommendation System for the HOL4 Theorem Prover0
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving0
Formal Mathematical Reasoning: A New Frontier in AI0
Towards Scientific Discovery with Generative AI: Progress, Opportunities, and Challenges0
WithdrarXiv: A Large-Scale Dataset for Retraction StudyCode0
Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring0
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically0
Learning Rules Explaining Interactive Theorem Proving Tactic PredictionCode0
Beyond Autoregression: Fast LLMs via Self-Distillation Through TimeCode1
Pantograph: A Machine-to-Machine Interaction Interface for Advanced Theorem Proving, High Level Reasoning, and Data Extraction in Lean 4Code2
Alchemy: Amplifying Theorem-Proving Capability through Symbolic MutationCode0
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning0
3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes0
LeanAgent: Lifelong Learning for Formal Theorem ProvingCode2
Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4Code0
SubgoalXL: Subgoal-based Expert Learning for Theorem ProvingCode1
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree SearchCode4
Revealed Invariant Preference0
miniCTX: Neural Theorem Proving with (Long-)ContextsCode4
Artifical intelligence and inherent mathematical difficulty0
LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN proverCode4
PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical CompetitionCode3
Lean-STaR: Learning to Interleave Thinking and Proving0
Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent0
TheoremLlama: Transforming General-Purpose LLMs into Lean4 ExpertsCode1
Learning Formal Mathematics From Intrinsic MotivationCode2
FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem ProvingCode0
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative GrammarsCode0
miniCodeProps: a Minimal Benchmark for Proving Code Properties0
Lean Workbook: A large-scale Lean problem set formalized from natural language math problemsCode4
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data0
Proving Theorems RecursivelyCode1
A Certified Proof Checker for Deep Neural Network Verification in Imandra0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem ProvingCode0
Lean Copilot: Large Language Models as Copilots for Theorem Proving in LeanCode5
A Survey on Deep Learning for Theorem ProvingCode3
Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic ProvingCode0
Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry0
Proceedings 12th International Workshop on Theorem proving components for Educational software0
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with AutoformalizationCode1
Multi-Task Learning with Multi-Task Optimization0
LeanReasoner: Boosting Complex Logical Reasoning with LeanCode1
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code0
Learning Guided Automated Reasoning: A Brief Survey0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving0
A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic0
On the (In)feasibility of ML Backdoor Detection as an Hypothesis Testing ProblemCode0
Show:102550
← PrevPage 2 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNN-LSTMClassification Accuracy0.83Unverified
41D CNNClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified