Math Word Problem Solving

A math word problem is a mathematical exercise (such as in a textbook, worksheet, or exam) where significant background information on the problem is presented in ordinary language rather than in mathematical notation. As most word problems involve a narrative of some sort, they are sometimes referred to as story problems and may vary in the amount of technical language used.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 81–90 of 107 papers

Title	Date	Tasks	Status
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement	Sep 18, 2024	GSM8KMath	—Unverified
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models	Aug 1, 2023	In-Context LearningMath	—Unverified
Towards Interpretable Math Word Problem Solving with Grounded Linguistic Logic Reasoning	Nov 16, 2021	MathMath Word Problem Solving	—Unverified
Translating a Math Word Problem to a Expression Tree	Oct 1, 2018	Machine TranslationMath	—Unverified
Using Intermediate Representations to Solve Math Word Problems	Jul 1, 2018	MathMath Word Problem Solving	—Unverified
When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems	Oct 16, 2024	HallucinationMath	—Unverified
Improving Compositional Generalization in Math Word Problem Solving	Sep 3, 2022	Data AugmentationMath	CodeCode Available
Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning	Dec 9, 2023	Arithmetic ReasoningMathematical Reasoning	CodeCode Available
Reverse Operation based Data Augmentation for Solving Math Word Problems	Oct 4, 2020	Data AugmentationMath	CodeCode Available
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation	Oct 17, 2024	GSM8KLanguage Modeling	CodeCode Available

Show:10 25 50

← PrevPage 9 of 11Next →

All datasets MATH SVAMP MAWPS Math23K ALG514 ASDiv-A ParaMAWPS DRAW-1K MathQA SVAMP (1:N)GSM-Plus MATH minival

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Gemini 2.0 Flash Experimental	Accuracy	89.7	—	Unverified
2	Qwen2.5-Math-72B-Instruct(TIR,Greedy)	Accuracy	88.1	—	Unverified
3	GPT-4 Turbo (MACM, w/code, voting)	Accuracy	87.92	—	Unverified
4	Qwen2.5-Math-72B-Instruct(COT,Greedy)	Accuracy	85.9	—	Unverified
5	Qwen2.5-Math-7B-Instruct(TIR,Greedy)	Accuracy	85.2	—	Unverified
6	GPT-4-code model (CSV, w/ code, SC, k=16)	Accuracy	84.3	—	Unverified
7	Qwen2-Math-72B-Instruct(greedy)	Accuracy	84	—	Unverified
8	Qwen2.5-Math-7B-Instruct(COT,Greedy)	Accuracy	83.6	—	Unverified
9	Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)	Accuracy	79.9	—	Unverified
10	OpenMath2-Llama3.1-70B (majority@256)	Accuracy	79.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 DUP	Accuracy	94.2	—	Unverified
2	GPT-4 (Teaching-Inspired)	Execution Accuracy	93.9	—	Unverified
3	GPT-4 (Model Selection)	Execution Accuracy	93.7	—	Unverified
4	Qwen2(CoT + Code Interpreter)	Execution Accuracy	92.3	—	Unverified
5	GPT-4 (PHP)	Execution Accuracy	91.9	—	Unverified
6	OpenMath-CodeLlama-70B (w/ code)	Execution Accuracy	87.8	—	Unverified
7	MathCoder-L-70B	Execution Accuracy	84.9	—	Unverified
8	PoT_Eng (self-consistency @ 5)	Execution Accuracy	83.7	—	Unverified
9	CoT_Eng (self-consistency @ 5)	Execution Accuracy	82.5	—	Unverified
10	MMOS-CODE-34B(0-shot)	Execution Accuracy	80.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	OpenMath-CodeLlama-70B (w/ code)	Accuracy (%)	95.7	—	Unverified
2	MsAT-DeductReasoner	Accuracy (%)	94.3	—	Unverified
3	ATHENA (roberta-large)	Accuracy (%)	93	—	Unverified
4	Exp-Tree	Accuracy (%)	92.3	—	Unverified
5	Multi-view	Accuracy (%)	92.3	—	Unverified
6	ATHENA (roberta-base)	Accuracy (%)	92.2	—	Unverified
7	Roberta-DeductReasoner	Accuracy (%)	92	—	Unverified
8	DeBERTa (PM + VM)	Accuracy (%)	91	—	Unverified
9	EPT	Accuracy (%)	88.7	—	Unverified
10	Graph2Tree with RoBERTa	Accuracy (%)	88.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GPT-4 (Teaching-Inspired)	Accuracy (5-fold)	94.3	—	Unverified
2	ATHENA (roberta-large)	Accuracy (training-test)	86.5	—	Unverified
3	Multi-view* (ours)	Accuracy (5-fold)	85.2	—	Unverified
4	ATHENA (roberta-base)	Accuracy (training-test)	84.4	—	Unverified
5	Generate and Rank	Accuracy (5-fold)	84.3	—	Unverified
6	Exp-Tree	Accuracy (5-fold)	84.1	—	Unverified
7	REAL2: Memory-augmented Solver	Accuracy (5-fold)	83.18	—	Unverified
8	Roberta-DeductReasoner	Accuracy (5-fold)	83	—	Unverified
9	MWP-BERT	Accuracy (5-fold)	82.4	—	Unverified
10	Recall and Learn	Accuracy (5-fold)	80.8	—	Unverified