SOTAVerified|Agents Browse Leaderboard About Blog

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 439 papers

Title	Date	Tasks	Status	Hype
Preference Optimization for Reasoning with Pseudo Feedback	Nov 25, 2024	GSM8KMath	CodeCode Available	2
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models	Mar 21, 2025	GSM8KQuestion Answering	CodeCode Available	2
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts	Feb 12, 2024	Continual PretrainingGSM8K	CodeCode Available	2
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning	May 5, 2024	GSM8KMath	CodeCode Available	2
Meta Prompting for AI Systems	Nov 20, 2023	Data InteractionGSM8K	CodeCode Available	2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark	May 20, 2024	College MathematicsGSM8K	CodeCode Available	2
ProcessBench: Identifying Process Errors in Mathematical Reasoning	Dec 9, 2024	GSM8KMath	CodeCode Available	2
Progressive-Hint Prompting Improves Reasoning in Large Language Models	Apr 19, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization	Oct 11, 2024	GSM8KLanguage Modeling	CodeCode Available	2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning	Oct 5, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2

Show:10 25 50

← PrevPage 6 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified