Automated Theorem Proving
The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems.
Source: Learning to Prove Theorems by Learning to Generate Theorems
Papers
Showing 1–10 of 288 papers
All datasetsminiF2F-testminiF2F-validHolStep (Conditional)HOList benchmarkHolStep (Unconditional)Metamath set.mmminiF2F-curriculumCompCertCoqGym
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Kimina-Prover-Preview | cumulative | 80.74 | — | Unverified |
| 2 | ProofAug | cumulative | 66 | — | Unverified |
| 3 | DeepSeek-Prover-V1.5 | cumulative | 63.5 | — | Unverified |
| 4 | Subgoal-XL | cumulative | 56.1 | — | Unverified |
| 5 | DeepSeek-Prover | cumulative | 52 | — | Unverified |
| 6 | Lyra + GPT-4 | cumulative | 47.1 | — | Unverified |
| 7 | LEGO-Prover ChatGPT | cumulative | 47.1 | — | Unverified |
| 8 | Decomposing the Enigma | cumulative | 45.5 | — | Unverified |
| 9 | Evariste | cumulative | 41 | — | Unverified |
| 10 | Evariste-7d | cumulative | 40.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Evariste | Pass@64 | 58.6 | — | Unverified |
| 2 | LEGO-Prover ChatGPT | Pass@100 | 57 | — | Unverified |
| 3 | Lyra + GPT-4 | Pass@100 | 52 | — | Unverified |
| 4 | Evariste-7d | Pass@64 | 47.5 | — | Unverified |
| 5 | GPT-f | Pass@64 | 47.3 | — | Unverified |
| 6 | Evariste-1d | Pass@64 | 46.7 | — | Unverified |
| 7 | DSP (62B Minerva informal) | Pass@100 | 43.9 | — | Unverified |
| 8 | Lean GPT-f | Pass@8 | 29.3 | — | Unverified |
| 9 | Lean tidy | Pass@1 | 16.8 | — | Unverified |
| 10 | Metamath GPT-f | Pass@8 | 2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | MPNN-DagLSTM | Classification Accuracy | 0.92 | — | Unverified |
| 2 | FormulaNet | Classification Accuracy | 0.9 | — | Unverified |
| 3 | FormulaNet-basic | Classification Accuracy | 0.89 | — | Unverified |
| 4 | Siamese 1D CNN-LSTM | Classification Accuracy | 0.83 | — | Unverified |
| 5 | Siamese 1D CNN | Classification Accuracy | 0.82 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | 4-hop GNN, sub-expression sharing | Percentage correct | 49.95 | — | Unverified |
| 2 | Tactic Dependent Loop | Percentage correct | 38.88 | — | Unverified |
| 3 | BoW2 (extra -ves) | Percentage correct | 36.55 | — | Unverified |
| 4 | Deeper Wider WaveNet | Percentage correct | 32.65 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | FormulaNet | Classification Accuracy | 0.9 | — | Unverified |
| 2 | FormulaNet-basic | Classification Accuracy | 0.89 | — | Unverified |
| 3 | 1D CNN-LSTM | Classification Accuracy | 0.83 | — | Unverified |
| 4 | 1D CNN | Classification Accuracy | 0.83 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Evariste | Pass@32 | 72.4 | — | Unverified |
| 2 | GPT-f | Percentage correct | 56.2 | — | Unverified |
| 3 | MetaGen-IL + Holophrasm | Percentage correct | 22.1 | — | Unverified |
| 4 | Holophrasm | Percentage correct | 14.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Evariste-7d | Pass@64 | 42.5 | — | Unverified |
| 2 | Evariste-1d | Pass@64 | 33.6 | — | Unverified |
| 3 | Evariste | Pass@64 | 32.1 | — | Unverified |
| 4 | GPT-f | Pass@64 | 30.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Proverbot9001 | Percentage correct | 19.36 | — | Unverified |
| 2 | CoqGym/ASTactic | Percentage correct | 4.99 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ASTactic | Percentage correct | 12.2 | — | Unverified |