SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 439 papers

Title	Date	Tasks	Status	Hype
Synthetic Data RL: Task Definition Is All You Need	May 18, 2025	AllGSM8K	CodeCode Available	2
SLOT: Sample-specific Language Model Optimization at Test-time	May 18, 2025	GSM8KLanguage Modeling	CodeCode Available	2
Dynamic Early Exit in Reasoning Models	Apr 22, 2025	GSM8KMath	CodeCode Available	2
SEAL: Steerable Reasoning Calibration of Large Language Models for Free	Apr 7, 2025	GSM8K	CodeCode Available	2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models	Mar 28, 2025	GPUGSM8K	CodeCode Available	2
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models	Mar 21, 2025	GSM8KQuestion Answering	CodeCode Available	2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models	Feb 24, 2025	GSM8KMath	CodeCode Available	2
SIFT: Grounding LLM Reasoning in Contexts via Stickers	Feb 19, 2025	GSM8KMath	CodeCode Available	2
CoT-Valve: Length-Compressible Chain-of-Thought Tuning	Feb 13, 2025	GSM8K	CodeCode Available	2
Natural Language Fine-Tuning	Dec 29, 2024	GSM8KLarge Language Model	CodeCode Available	2

Show:10 25 50

← PrevPage 5 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified