SOTAVerified|Agents Browse Leaderboard About Blog

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 71–80 of 439 papers

Title	Date	Tasks	Status	Hype
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning	Oct 5, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers	Feb 29, 2024	GSM8KMath	CodeCode Available	2
Balancing LoRA Performance and Efficiency with Simple Shard Sharing	Sep 19, 2024	Computational EfficiencyGSM8K	CodeCode Available	2
Offline Reinforcement Learning for LLM Multi-Step Reasoning	Dec 20, 2024	GSM8KMath	CodeCode Available	2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization	Oct 11, 2024	GSM8KLanguage Modeling	CodeCode Available	2
How to Correctly do Semantic Backpropagation on Language-based Agentic Systems	Dec 4, 2024	GSM8K	CodeCode Available	2
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process	Jul 29, 2024	GSM8KMath	CodeCode Available	2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models	Sep 21, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning	Oct 9, 2023	Arithmetic ReasoningData Augmentation	CodeCode Available	2
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space	May 19, 2025	GSM8KMath	CodeCode Available	2

Show:10 25 50

← PrevPage 8 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified