SOTAVerified|Agents Browse Leaderboard About Blog

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 81–90 of 439 papers

Title	Date	Tasks	Status	Hype
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models	Aug 3, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2
Multiple-Choice Questions are Efficient and Robust LLM Evaluators	May 20, 2024	GSM8KHumanEval	CodeCode Available	1
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations	Dec 14, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes	Oct 22, 2024	GSM8KLanguage Modeling	CodeCode Available	1
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models	Mar 4, 2024	Data AugmentationGSM8K	CodeCode Available	1
Automatic Model Selection with Large Language Models for Reasoning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1
CommVQ: Commutative Vector Quantization for KV Cache Compression	Jun 23, 2025	GPUGSM8K	CodeCode Available	1
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1
Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning	Oct 8, 2024	GSM8KMulti-agent Reinforcement Learning	CodeCode Available	1
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization	Oct 27, 2024	GSM8KHellaSwag	CodeCode Available	1

Show:10 25 50

← PrevPage 9 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified