SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 101150 of 288 papers

TitleStatusHype
REFACTOR: Learning to Extract Theorems from ProofsCode0
An Empirical Study of Data Ability Boundary in LLMs' Math ReasoningCode2
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof DataCode1
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
0-1 laws for pattern occurrences in phylogenetic trees and networks0
Task Success is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors0
Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint SolvingCode0
Graph2Tac: Online Representation Learning of Formal Math Concepts0
Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method0
Automated Planning Techniques for Elementary Proofs in Abstract Algebra0
Large Language Models' Understanding of Math: Source Criticism and Extrapolation0
Generative Learning of Continuous Data by Tensor Networks0
math-PVS: A Large Language Model Framework to Map Scientific Publications to PVS Theories0
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language ModelsCode0
Llemma: An Open Language Model For MathematicsCode3
An In-Context Learning Agent for Formal Theorem-ProvingCode1
LEGO-Prover: Neural Theorem Proving with Growing LibrariesCode1
Lyra: Orchestrating Dual Correction in Automated Theorem ProvingCode1
The Mathematical Game0
FIMO: A Challenge Formal Dataset for Automated Theorem ProvingCode1
Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics0
LeanDojo: Theorem Proving with Retrieval-Augmented Language ModelsCode2
Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem ProvingCode1
Theorem Proving in Dependently-Typed Higher-Order Logic -- Extended PreprintCode0
An Ensemble Approach for Automated Theorem Proving Based on Efficient Name Invariant Graph Neural RepresentationsCode1
Translating SUMO-K to Higher-Order Set Theory0
Planning as Theorem Proving with Heuristics0
Probabilistic unifying relations for modelling epistemic and aleatoric uncertainty: semantics and automated reasoning with theorem proving0
Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models0
Lemmas: Generation, Selection, ApplicationCode0
Proceedings 11th International Workshop on Theorem Proving Components for Educational Software0
Magnushammer: A Transformer-Based Approach to Premise Selection0
ProofNet: Autoformalizing and Formally Proving Undergraduate-Level MathematicsCode1
Anti-unification and Generalization: A Survey0
EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry0
Parsel: Algorithmic Reasoning with Language Models by Composing DecompositionsCode2
Solving Quantified Modal Logic Problems by Translation to Classical LogicsCode0
Peano: Learning Formal Mathematical ReasoningCode1
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal ProofsCode1
TextGraphs-16 Natural Language Premise Selection Task: Zero-Shot Premise Selection with Prompting Generative Language Models0
Keyword-based Natural Language Premise Selection for an Automatic Mathematical Statement Proving0
Generating Compressed Combinatory Proof Structures -- An Approach to Automated First-Order Theorem Proving0
Proceedings 38th International Conference on Logic Programming0
CD Tools -- Condensed Detachment and Structure Generating Theorem Proving (System Description)0
Learning to Prove Trigonometric Identities0
Exploring Length Generalization in Large Language Models0
Constrained Training of Neural Networks via Theorem Proving0
Formal Specifications from Natural Language0
Learning to Find Proofs and Theorems by Learning to Refine Search Strategies: The Case of Loop Invariant Synthesis0
NaturalProver: Grounded Mathematical Proof Generation with Language ModelsCode1
Show:102550
← PrevPage 3 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNN-LSTMClassification Accuracy0.83Unverified
41D CNNClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified