ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

2023-09-29Code Available3· sign in to hype

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen

Code Available — Be the first to reproduce this paper.

Code

github.com/microsoft/tora
OfficialIn paperpytorch★ 1,114

Abstract

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learning on the annotations, and propose output space shaping to further refine models' reasoning behavior. As a result, ToRA models significantly outperform open-source models on 10 mathematical reasoning datasets across all scales with 13%-19% absolute improvements on average. Notably, ToRA-7B reaches 44.6% on the competition-level dataset MATH, surpassing the best open-source model WizardMath-70B by 22% absolute. ToRA-Code-34B is also the first open-source model that achieves an accuracy exceeding 50% on MATH, which significantly outperforms GPT-4's CoT result, and is competitive with GPT-4 solving problems with programs. Additionally, we conduct a comprehensive analysis of the benefits and remaining challenges of tool interaction for mathematical reasoning, providing valuable insights for future research.

Tasks

Arithmetic Reasoning Computational Efficiency Imitation Learning Math Mathematical Problem-Solving Mathematical Reasoning Math Word Problem Solving

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
GSM8K	ToRA-70B (SC, k=50)	Accuracy	88.3	—	Unverified
GSM8K	ToRA-Code-34B (SC, k=50)	Accuracy	85.1	—	Unverified
GSM8K	ToRA 70B	Accuracy	84.3	—	Unverified
GSM8K	ToRA-Code 34B	Accuracy	80.7	—	Unverified
GSM8K	ToRA-Code 13B	Accuracy	75.8	—	Unverified
GSM8K	ToRA-Code 7B	Accuracy	72.6	—	Unverified

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Code

Abstract

Tasks

Benchmark Results

Reproductions