SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 201250 of 288 papers

TitleStatusHype
Automated Reasoning in Non-classical Logics in the TPTP World0
Automated Theorem Proving in Intuitionistic Propositional Logic by Deep Reinforcement Learning0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving0
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving0
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving0
BliStrTune: Hierarchical Invention of Theorem Proving Strategies0
Blocking and Other Enhancements for Bottom-Up Model Generation Methods0
Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models0
CD Tools -- Condensed Detachment and Structure Generating Theorem Proving (System Description)0
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective0
Conjectures, Tests and Proofs: An Overview of Theory Exploration0
Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning0
Constrained Training of Neural Networks via Theorem Proving0
Count-Invariance Including Exponentials0
COVER: Covering the Semantically Tractable Questions0
Cross-checking WordNet and SUMO Using Meronymy0
CSPLib: Twenty Years On0
DeepAlgebra - an outline of a program0
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data0
Deriving Theorems in Implicational Linear Logic, Declaratively0
Designing Game of Theorems0
Efficient Neural Clause-Selection Reinforcement0
Modelling Value-oriented Legal Reasoning in LogiKEy0
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code0
Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection0
Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method0
ENIGMA-NG: Efficient Neural and Gradient-Boosted Inference Guidance for E0
EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry0
Evaluating an Automata Approach to Query Containment0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
Exploring Length Generalization in Large Language Models0
Extracting Higher-Order Goals from the Mizar Mathematical Library0
Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations0
Faithful Logic Embeddings in HOL -- Deep and Shallow0
Formal Mathematical Reasoning: A New Frontier in AI0
Formal Specifications from Natural Language0
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically0
From the String Landscape to the Mathematical Landscape: a Machine-Learning Outlook0
From Width-Based Model Checking to Width-Based Automated Theorem Proving0
Generating Compressed Combinatory Proof Structures -- An Approach to Automated First-Order Theorem Proving0
Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs0
Generative Language Modeling for Automated Theorem Proving0
Generative Learning of Continuous Data by Tensor Networks0
GeoGebra Tools with Proof Capabilities0
GraATP: A Graph Theoretic Approach for Automated Theorem Proving in Plane Geometry0
Graph2Tac: Online Representation Learning of Formal Math Concepts0
Graph Contrastive Pre-training for Effective Theorem Reasoning0
Graph Representations for Higher-Order Logic and Theorem Proving0
Hammering Mizar by Learning Clause Guidance0
Hua-Chen New Theory of Economic Optimization0
Show:102550
← PrevPage 5 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNNClassification Accuracy0.83Unverified
41D CNN-LSTMClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified