SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 51100 of 288 papers

TitleStatusHype
AI Descartes: Combining Data and Theory for Derivable Scientific DiscoveryCode1
MiniF2F: a cross-system benchmark for formal Olympiad-level mathematicsCode1
ProoFVer: Natural Logic Theorem Proving for Fact VerificationCode1
Learning Theorem Proving ComponentsCode1
NaturalProofs: Mathematical Theorem Proving in Natural LanguageCode1
Proof Artifact Co-training for Theorem Proving with Language ModelsCode1
Learning as Abduction: Trainable Natural Logic Theorem Prover for Natural Language InferenceCode1
Measuring Systematic Generalization in Neural Proof Generation with TransformersCode1
INT: An Inequality Benchmark for Evaluating Generalization in Theorem ProvingCode1
Logical Neural NetworksCode1
Logical Inferences with Comparatives and Generalized QuantifiersCode1
Prolog Technology Reinforcement Learning ProverCode1
Learning to Prove Theorems by Learning to Generate TheoremsCode1
A Deep Reinforcement Learning Approach to First-Order Logic Theorem ProvingCode1
LangPro: Natural Language Theorem ProverCode1
Prover Agent: An Agent-based Framework for Formal Mathematical Proofs0
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving0
MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems?0
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening0
Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations0
ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction0
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation0
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math CompetitionsCode0
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement0
MIRB: Mathematical Information Retrieval BenchmarkCode0
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data GenerationCode0
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation0
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning0
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving0
Proceedings The 13th International Workshop on Theorem proving components for Educational software0
The Limits of AI Explainability: An Algorithmic Information Theory Approach0
Hua-Chen New Theory of Economic Optimization0
APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries0
Hierarchical Attention Generates Better ProofsCode0
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification0
Reasoning Models Can Be Effective Without Thinking0
Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection0
Reasoning Under Threat: Symbolic and Neural Techniques for Cybersecurity Verification0
A Survey on Mathematical Reasoning and Optimization with Large Language ModelsCode0
Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview0
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving0
Efficient Neural Clause-Selection Reinforcement0
Faithful Logic Embeddings in HOL -- Deep and Shallow0
Quantum Machine Learning in Precision Medicine and Drug Discovery -- A Game Changer for Tailored Treatments?0
LeanProgress: Guiding Search for Neural Theorem Proving via Proof Progress Prediction0
A Combinatorial Identities Benchmark for Theorem Proving via Automated Theorem Generation0
Activation Steering in Neural Theorem Provers0
Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs0
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation0
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving0
Show:102550
← PrevPage 2 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNNClassification Accuracy0.83Unverified
41D CNN-LSTMClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified