SOTAVerified

Math Word Problem Solving

A math word problem is a mathematical exercise (such as in a textbook, worksheet, or exam) where significant background information on the problem is presented in ordinary language rather than in mathematical notation. As most word problems involve a narrative of some sort, they are sometimes referred to as story problems and may vary in the amount of technical language used.

Papers

Showing 110 of 107 papers

Show:102550
← PrevPage 1 of 11Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Gemini 2.0 Flash ExperimentalAccuracy89.7Unverified
2Qwen2.5-Math-72B-Instruct(TIR,Greedy)Accuracy88.1Unverified
3GPT-4 Turbo (MACM, w/code, voting)Accuracy87.92Unverified
4Qwen2.5-Math-72B-Instruct(COT,Greedy)Accuracy85.9Unverified
5Qwen2.5-Math-7B-Instruct(TIR,Greedy)Accuracy85.2Unverified
6GPT-4-code model (CSV, w/ code, SC, k=16)Accuracy84.3Unverified
7Qwen2-Math-72B-Instruct(greedy)Accuracy84Unverified
8Qwen2.5-Math-7B-Instruct(COT,Greedy)Accuracy83.6Unverified
9Qwen2.5-Math-1.5B-Instruct(TIR,Greedy)Accuracy79.9Unverified
10OpenMath2-Llama3.1-70B (majority@256)Accuracy79.6Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 DUPAccuracy94.2Unverified
2GPT-4 (Teaching-Inspired)Execution Accuracy93.9Unverified
3GPT-4 (Model Selection)Execution Accuracy93.7Unverified
4Qwen2(CoT + Code Interpreter)Execution Accuracy92.3Unverified
5GPT-4 (PHP)Execution Accuracy91.9Unverified
6OpenMath-CodeLlama-70B (w/ code)Execution Accuracy87.8Unverified
7MathCoder-L-70BExecution Accuracy84.9Unverified
8PoT_Eng (self-consistency @ 5)Execution Accuracy83.7Unverified
9CoT_Eng (self-consistency @ 5)Execution Accuracy82.5Unverified
10MMOS-CODE-34B(0-shot)Execution Accuracy80.6Unverified
#ModelMetricClaimedVerifiedStatus
1OpenMath-CodeLlama-70B (w/ code)Accuracy (%)95.7Unverified
2MsAT-DeductReasonerAccuracy (%)94.3Unverified
3ATHENA (roberta-large)Accuracy (%)93Unverified
4Exp-TreeAccuracy (%)92.3Unverified
5Multi-viewAccuracy (%)92.3Unverified
6ATHENA (roberta-base)Accuracy (%)92.2Unverified
7Roberta-DeductReasonerAccuracy (%)92Unverified
8DeBERTa (PM + VM)Accuracy (%)91Unverified
9EPTAccuracy (%)88.7Unverified
10Graph2Tree with RoBERTaAccuracy (%)88.7Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4 (Teaching-Inspired)Accuracy (5-fold)94.3Unverified
2ATHENA (roberta-large)Accuracy (training-test)86.5Unverified
3Multi-view* (ours)Accuracy (5-fold)85.2Unverified
4ATHENA (roberta-base)Accuracy (training-test)84.4Unverified
5Generate and RankAccuracy (5-fold)84.3Unverified
6Exp-TreeAccuracy (5-fold)84.1Unverified
7REAL2: Memory-augmented SolverAccuracy (5-fold)83.18Unverified
8Roberta-DeductReasonerAccuracy (5-fold)83Unverified
9MWP-BERTAccuracy (5-fold)82.4Unverified
10Recall and LearnAccuracy (5-fold)80.8Unverified