SOTAVerified

Automated Theorem Proving

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.

Source: Learning to Prove Theorems by Learning to Generate Theorems

Papers

Showing 101150 of 288 papers

TitleStatusHype
Activation Steering in Neural Theorem Provers0
A Curious New Result of Resolution Strategies in Negation-Limited Inverters Problem0
Adversarial Learning to Reason in an Arbitrary Logic0
Analysis of Algorithms and Partial Algorithms0
An Experimental Study of Formula Embeddings for Automated Theorem Proving in First-Order Logic0
Anti-unification and Generalization: A Survey0
APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries0
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning0
Applying Second-Order Quantifier Elimination in Inspecting Gödel's Ontological Proof0
Artifical intelligence and inherent mathematical difficulty0
Artificial Neural Networks that Learn to Satisfy Logic Constraints0
A state vector algebra for algorithmic implementation of second-order logic0
A Study of Continuous Vector Representationsfor Theorem Proving0
ATG: Benchmarking Automated Theorem Generation for Generative Language Models0
Autoformalization with Large Language Models0
Automated Planning Techniques for Elementary Proofs in Abstract Algebra0
Automated Reasoning in Non-classical Logics in the TPTP World0
Automated Theorem Proving in Intuitionistic Propositional Logic by Deep Reinforcement Learning0
BAIT: Benchmarking (Embedding) Architectures for Interactive Theorem-Proving0
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving0
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving0
BliStrTune: Hierarchical Invention of Theorem Proving Strategies0
Blocking and Other Enhancements for Bottom-Up Model Generation Methods0
Can neural networks do arithmetic? A survey on the elementary numerical skills of state-of-the-art deep learning models0
CD Tools -- Condensed Detachment and Structure Generating Theorem Proving (System Description)0
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective0
Conjectures, Tests and Proofs: An Overview of Theory Exploration0
Consistent CCG Parsing over Multiple Sentences for Improved Logical Reasoning0
Constrained Training of Neural Networks via Theorem Proving0
Count-Invariance Including Exponentials0
COVER: Covering the Semantically Tractable Questions0
Cross-checking WordNet and SUMO Using Meronymy0
CSPLib: Twenty Years On0
DeepAlgebra - an outline of a program0
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data0
Deriving Theorems in Implicational Linear Logic, Declaratively0
Designing Game of Theorems0
Efficient Neural Clause-Selection Reinforcement0
Modelling Value-oriented Legal Reasoning in LogiKEy0
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code0
Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection0
Enhancing Neural Theorem Proving through Data Augmentation and Dynamic Sampling Method0
ENIGMA-NG: Efficient Neural and Gradient-Boosted Inference Guidance for E0
EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry0
Evaluating an Automata Approach to Query Containment0
EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages0
Exploring Length Generalization in Large Language Models0
Extracting Higher-Order Goals from the Mizar Mathematical Library0
Faithful and Robust LLM-Driven Theorem Proving for NLI Explanations0
Faithful Logic Embeddings in HOL -- Deep and Shallow0
Show:102550
← PrevPage 3 of 6Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Kimina-Prover-Previewcumulative80.74Unverified
2ProofAugcumulative66Unverified
3DeepSeek-Prover-V1.5cumulative63.5Unverified
4Subgoal-XLcumulative56.1Unverified
5DeepSeek-Provercumulative52Unverified
6Lyra + GPT-4cumulative47.1Unverified
7LEGO-Prover ChatGPTcumulative47.1Unverified
8Decomposing the Enigmacumulative45.5Unverified
9Evaristecumulative41Unverified
10Evariste-7dcumulative40.6Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@6458.6Unverified
2LEGO-Prover ChatGPTPass@10057Unverified
3Lyra + GPT-4Pass@10052Unverified
4Evariste-7dPass@6447.5Unverified
5GPT-fPass@6447.3Unverified
6Evariste-1dPass@6446.7Unverified
7DSP (62B Minerva informal)Pass@10043.9Unverified
8Lean GPT-fPass@829.3Unverified
9Lean tidyPass@116.8Unverified
10Metamath GPT-fPass@82Unverified
#ModelMetricClaimedVerifiedStatus
1MPNN-DagLSTMClassification Accuracy0.92Unverified
2FormulaNetClassification Accuracy0.9Unverified
3FormulaNet-basicClassification Accuracy0.89Unverified
4Siamese 1D CNN-LSTMClassification Accuracy0.83Unverified
5Siamese 1D CNNClassification Accuracy0.82Unverified
#ModelMetricClaimedVerifiedStatus
14-hop GNN, sub-expression sharingPercentage correct49.95Unverified
2Tactic Dependent LoopPercentage correct38.88Unverified
3BoW2 (extra -ves)Percentage correct36.55Unverified
4Deeper Wider WaveNetPercentage correct32.65Unverified
#ModelMetricClaimedVerifiedStatus
1FormulaNetClassification Accuracy0.9Unverified
2FormulaNet-basicClassification Accuracy0.89Unverified
31D CNN-LSTMClassification Accuracy0.83Unverified
41D CNNClassification Accuracy0.83Unverified
#ModelMetricClaimedVerifiedStatus
1EvaristePass@3272.4Unverified
2GPT-fPercentage correct56.2Unverified
3MetaGen-IL + HolophrasmPercentage correct22.1Unverified
4HolophrasmPercentage correct14.3Unverified
#ModelMetricClaimedVerifiedStatus
1Evariste-7dPass@6442.5Unverified
2Evariste-1dPass@6433.6Unverified
3EvaristePass@6432.1Unverified
4GPT-fPass@6430.6Unverified
#ModelMetricClaimedVerifiedStatus
1Proverbot9001Percentage correct19.36Unverified
2CoqGym/ASTacticPercentage correct4.99Unverified
#ModelMetricClaimedVerifiedStatus
1ASTacticPercentage correct12.2Unverified