Analysing Mathematical Reasoning Abilities of Neural Models

2019-04-02ICLR 2019Code Available0· sign in to hype

David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli

Code Available — Be the first to reproduce this paper.

Code

github.com/jlrussin/interpret-math-transformer
pytorch★ 8
github.com/mesotron/teaching_transformers
none★ 5
github.com/deepmind/mathematics_dataset
In papernone★ 0
github.com/andrewschreiber/hs-math-nlp
pytorch★ 0
github.com/mandubian/pytorch_math_dataset
pytorch★ 0
github.com/r-bakes/math_language_processing
pytorch★ 0
github.com/berniwal/DeepLearningProject
tf★ 0

Abstract

Mathematical reasoning---a core ability within human intelligence---presents some unique challenges as a domain: we do not come to understand and solve mathematical problems primarily on the back of experience and evidence, but on the basis of inferring, learning, and exploiting laws, axioms, and symbol manipulation rules. In this paper, we present a new challenge for the evaluation (and eventually the design) of neural architectures and similar system, developing a task suite of mathematics problems involving sequential questions and answers in a free-form textual input/output format. The structured nature of the mathematics domain, covering arithmetic, algebra, probability and calculus, enables the construction of training and test splits designed to clearly illuminate the capabilities and failure-modes of different architectures, as well as evaluate their ability to compose and relate knowledge and learned processes. Having described the data generation process and its potential future expansions, we conduct a comprehensive analysis of models from two broad classes of the most powerful sequence-to-sequence architectures and find notable differences in their ability to resolve mathematical problems and generalize their knowledge.

Tasks

Mathematical Question Answering Mathematical Reasoning Math Word Problem Solving Question Answering

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Mathematics Dataset	Transformer	Accuracy	0.76	—	Unverified
Mathematics Dataset	LSTM	Accuracy	0.57	—	Unverified

Analysing Mathematical Reasoning Abilities of Neural Models

Code

Abstract

Tasks

Benchmark Results

Reproductions