SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–360 of 439 papers

Title	Date	Tasks	Status	Hype	Score
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models	May 12, 2025	GSM8KLarge Language Model	—Unverified	0	0
Uncovering Latent Chain of Thought Vectors in Language Models	Sep 21, 2024	ARCGSM8K	—Unverified	0	0
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models	Aug 28, 2024	Data AugmentationGSM8K	—Unverified	0	0
Cool-Fusion: Fuse Large Language Models without Training	Jul 29, 2024	Combinatorial OptimizationGSM8K	—Unverified	0	0
ControlMath: Controllable Data Generation Promotes Math Generalist Models	Sep 20, 2024	Data AugmentationDiversity	—Unverified	0	0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On	Jul 11, 2024	GSM8KMath	—Unverified	0	0
Slimming Down LLMs Without Losing Their Minds	Jun 12, 2025	Computational EfficiencyGSM8K	—Unverified	0	0
Contrastive Decoding Improves Reasoning in Large Language Models	Sep 17, 2023	GSM8KHellaSwag	—Unverified	0	0
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Jul 29, 2024	GSM8KPrompt Engineering	—Unverified	0	0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment	Oct 23, 2024	GSM8KHumanEval	—Unverified	0	0

Show:10 25 50

← PrevPage 36 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified