SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 51100 of 288 papers

TitleStatusHype
AI Descartes: Combining Data and Theory for Derivable Scientific DiscoveryCode1
Prolog Technology Reinforcement Learning ProverCode1
Proof Artifact Co-training for Theorem Proving with Language ModelsCode1
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem ProvingCode1
An In-Context Learning Agent for Formal Theorem-ProvingCode1
Neural Unification for Logic Reasoning over Natural LanguageCode1
TheoremLlama: Transforming General-Purpose LLMs into Lean4 ExpertsCode1
Proving Theorems RecursivelyCode1
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal ProofsCode1
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with AutoformalizationCode1
An energy-based model for neuro-symbolic reasoning on knowledge graphsCode1
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement LearningCode1
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on InequalitiesCode1
SubgoalXL: Subgoal-based Expert Learning for Theorem ProvingCode1
Measuring Systematic Generalization in Neural Proof Generation with TransformersCode1
Scaling Synthetic Logical Reasoning Datasets with Context-Sensitive Declarative GrammarsCode0
A Survey on Mathematical Reasoning and Optimization with Large Language ModelsCode0
Deep Reinforcement Learning for Synthesizing Functions in Higher-Order LogicCode0
REFACTOR: Learning to Extract Theorems from ProofsCode0
DeepMath - Deep Sequence Models for Premise SelectionCode0
Solving Geometry Problems: Combining Text and Diagram InterpretationCode0
Alchemy: Amplifying Theorem-Proving Capability through Symbolic MutationCode0
OxKBC: Outcome Explanation for Factorization Based Knowledge Base CompletionCode0
Neural Theorem Provers Do Not Learn Rules Without ExplorationCode0
On-demand Injection of Lexical Knowledge for Recognising Textual EntailmentCode0
On the (In)feasibility of ML Backdoor Detection as an Hypothesis Testing ProblemCode0
Premise Selection for Theorem Proving by Deep Graph EmbeddingCode0
Solving Quantified Modal Logic Problems by Translation to Classical LogicsCode0
Logically Consistent Adversarial Attacks for Soft Theorem ProversCode0
LLM-based Automated Theorem Proving Hinges on Scalable Synthetic Data GenerationCode0
FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem ProvingCode0
Mathematical Formalized Problem Solving and Theorem Proving in Different Fields in Lean 4Code0
MIRB: Mathematical Information Retrieval BenchmarkCode0
Lectures on Jacques Herbrand as a LogicianCode0
Learning to Prove Theorems via Interacting with Proof AssistantsCode0
Aplib: Tactical Programming of Intelligent AgentsCode0
Lemmas: Generation, Selection, ApplicationCode0
Learning Symbolic Rules for Reasoning in Quasi-Natural LanguageCode0
G2SAT: Learning to Generate SAT FormulasCode0
GamePad: A Learning Environment for Theorem ProvingCode0
Learning Rules Explaining Interactive Theorem Proving Tactic PredictionCode0
Learning to Match Mathematical Statements with ProofsCode0
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math CompetitionsCode0
HolStep: A Machine Learning Dataset for Higher-order Logic Theorem ProvingCode0
Automated proof synthesis for propositional logic with deep neural networksCode0
HOL(y)Hammer: Online ATP Service for HOL LightCode0
Holophrasm: a neural Automated Theorem Prover for higher-order logicCode0
Improving Graph Neural Network Representations of Logical Formulae with Subgraph PoolingCode0
Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint SolvingCode0
Hierarchical Attention Generates Better ProofsCode0
Show:102550
← PrevPage 2 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNNClassification Accuracy0.83Unverified
41D CNN-LSTMClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified