SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 291–300 of 439 papers

Title	Date	Tasks	Status	Hype
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step	May 23, 2024	GSM8K	CodeCode Available	3
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark	May 20, 2024	College MathematicsGSM8K	CodeCode Available	2
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving	May 20, 2024	GSM8KMath	—Unverified	0
Meaning-Typed Programming: Language Abstraction and Runtime for Model-Integrated Applications	May 14, 2024	GSM8KMath	—Unverified	0
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning	May 13, 2024	Data AugmentationGSM8K	CodeCode Available	3
MathDivide: Improved mathematical reasoning by large language models	May 12, 2024	GSM8KLogical Reasoning	—Unverified	0
MAmmoTH2: Scaling Instructions from the Web	May 6, 2024	ChatbotGSM8K	—Unverified	0
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning	May 5, 2024	GSM8KMath	CodeCode Available	2
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning	May 1, 2024	ARCGSM8K	CodeCode Available	3

Show:10 25 50

← PrevPage 30 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified