SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 101150 of 288 papers

TitleStatusHype
Logically Consistent Adversarial Attacks for Soft Theorem ProversCode0
Towards Finding Longer ProofsCode0
Alchemy: Amplifying Theorem-Proving Capability through Symbolic MutationCode0
Aplib: Tactical Programming of Intelligent AgentsCode0
Guiding Inferences in Connection Tableau by Recurrent Neural NetworksCode0
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language ModelsCode0
Automated Completion of Statements and Proofs in Synthetic Geometry: an Approach based on Constraint SolvingCode0
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem ProvingCode0
WithdrarXiv: A Large-Scale Dataset for Retraction StudyCode0
MIRB: Mathematical Information Retrieval BenchmarkCode0
NIL: Learning Nonlinear Interpolants0
On Quantified Modal Theorem Proving for Modeling Ethics0
Performance Guarantees for Schatten-p Quasi-Norm Minimization in Recovery of Low-Rank Matrices0
Planning as Theorem Proving with Heuristics0
Probabilistic unifying relations for modelling epistemic and aleatoric uncertainty: semantics and automated reasoning with theorem proving0
Proceedings 10th International Workshop on Theorem Proving Components for Educational Software0
Proceedings 11th International Workshop on Theorem Proving Components for Educational Software0
Proceedings 12th International Workshop on Theorem proving components for Educational software0
Proceedings 35th International Conference on Logic Programming (Technical Communications)0
Proceedings 36th International Conference on Logic Programming (Technical Communications)0
Proceedings 37th International Conference on Logic Programming (Technical Communications)0
Proceedings 38th International Conference on Logic Programming0
Proceedings 6th International Workshop on Theorem proving components for Educational software0
Proceedings 7th International Workshop on Theorem proving components for Educational software0
Proceedings 8th International Workshop on Theorem Proving Components for Educational Software0
Proceedings 9th International Workshop on Theorem Proving Components for Educational Software0
Proceedings of the 13th International Conference on Automated Deduction in Geometry0
Proceedings of the Fifth Workshop on Proof eXchange for Theorem Proving0
Proceedings The 13th International Workshop on Theorem proving components for Educational software0
Proof Extraction for Logical Neural Networks0
Proof Flow: Preliminary Study on Generative Flow Network Language Model Tuning for Formal Reasoning0
ProofNet++: A Neuro-Symbolic System for Formal Proof Verification with Self-Correction0
Proof Recommendation System for the HOL4 Theorem Prover0
Property Invariant Embedding for Automated Reasoning0
Prover Agent: An Agent-based Framework for Formal Mathematical Proofs0
Proving the Coding Interview: A Benchmark for Formally Verified Code Generation0
Proving Theorems using Incremental Learning and Hindsight Experience Replay0
Quantum Machine Learning in Precision Medicine and Drug Discovery -- A Game Changer for Tailored Treatments?0
Rare Speed-up in Automatic Theorem Proving Reveals Tradeoff Between Computational Time and Information Value0
READ-EVAL-PRINT in Parallel and Asynchronous Proof-checking0
Reasoning Models Can Be Effective Without Thinking0
Reasoning Under Threat: Symbolic and Neural Techniques for Cybersecurity Verification0
Reinforcement Learning of Theorem Proving0
Revealed Invariant Preference0
Review on DNA Strand Algebra and its Application0
Rewarding the Unlikely: Lifting GRPO Beyond Distribution Sharpening0
Robust Computer Algebra, Theorem Proving, and Oracle AI0
RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation0
Scalable Neural Theorem Proving on Knowledge Bases and Natural Language0
Selection Strategies for Commonsense Knowledge0
Show:102550
← PrevPage 3 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNNClassification Accuracy0.83Unverified
41D CNN-LSTMClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified