HyperTree Proof Search for Neural Theorem Proving

2022-05-23Unverified0· sign in to hype

Guillaume Lample, Marie-Anne Lachaux, Thibaut Lavril, Xavier Martinet, Amaury Hayat, Gabriel Ebner, Aurélien Rodriguez, Timothée Lacroix

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We propose an online training procedure for a transformer-based automated theorem prover. Our approach leverages a new search algorithm, HyperTree Proof Search (HTPS), inspired by the recent success of AlphaZero. Our model learns from previous proof searches through online training, allowing it to generalize to domains far from the training distribution. We report detailed ablations of our pipeline's main components by studying performance on three environments of increasing complexity. In particular, we show that with HTPS alone, a model trained on annotated proofs manages to prove 65.4% of a held-out set of Metamath theorems, significantly outperforming the previous state of the art of 56.5% by GPT-f. Online training on these unproved theorems increases accuracy to 82.6%. With a similar computational budget, we improve the state of the art on the Lean-based miniF2F-curriculum dataset from 31% to 42% proving accuracy.

Tasks

Automated Theorem Proving

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Metamath set.mm	Evariste	Pass@32	72.4	—	Unverified
miniF2F-curriculum	Evariste-7d	Pass@64	42.5	—	Unverified
miniF2F-curriculum	GPT-f	Pass@64	30.6	—	Unverified
miniF2F-curriculum	Evariste	Pass@64	32.1	—	Unverified
miniF2F-curriculum	Evariste-1d	Pass@64	33.6	—	Unverified
miniF2F-test	GPT-f	cumulative	36.6	—	Unverified
miniF2F-test	Evariste	cumulative	41	—	Unverified
miniF2F-test	Evariste-7d	cumulative	40.6	—	Unverified
miniF2F-test	Evariste-1d	cumulative	38.9	—	Unverified
miniF2F-valid	GPT-f	Pass@64	47.3	—	Unverified
miniF2F-valid	Evariste-1d	Pass@64	46.7	—	Unverified
miniF2F-valid	Evariste-7d	Pass@64	47.5	—	Unverified
miniF2F-valid	Evariste	Pass@64	58.6	—	Unverified

HyperTree Proof Search for Neural Theorem Proving

Abstract

Tasks

Benchmark Results

Reproductions