SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 101125 of 288 papers

TitleStatusHype
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math CompetitionsCode0
Solving Quantified Modal Logic Problems by Translation to Classical LogicsCode0
Learning Symbolic Rules for Reasoning in Quasi-Natural LanguageCode0
Automated proof synthesis for propositional logic with deep neural networksCode0
Holophrasm: a neural Automated Theorem Prover for higher-order logicCode0
HolStep: A Machine Learning Dataset for Higher-order Logic Theorem ProvingCode0
HOL(y)Hammer: Online ATP Service for HOL LightCode0
Theorem Proving in Dependently-Typed Higher-Order Logic -- Extended PreprintCode0
Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint SolvingCode0
Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic ProvingCode0
Large Language Models' Understanding of Math: Source Criticism and Extrapolation0
Deriving Theorems in Implicational Linear Logic, Declaratively0
Keyword-based Natural Language Premise Selection for an Automatic Mathematical Statement Proving0
jsCoq: Towards Hybrid Theorem Proving Interfaces0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
Interactive, Intelligent Tutoring for Auxiliary Constructions in Geometry Proofs0
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data0
Improving QED-Tutrix by Automating the Generation of Proofs0
Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring0
HyperTree Proof Search for Neural Theorem Proving0
DeepAlgebra - an outline of a program0
Analysis of Algorithms and Partial Algorithms0
Activation Steering in Neural Theorem Provers0
HybridProver: Augmenting Theorem Proving with LLM-Driven Proof Synthesis and Refinement0
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving0
Show:102550
← PrevPage 5 of 12Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNNClassification Accuracy0.83Unverified
41D CNN-LSTMClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified