SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 161–170 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Exploring LLM Reasoning Through Controlled Prompt Variations	Apr 2, 2025	GSM8KMathematical Problem-Solving	CodeCode Available	0	5
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning	Aug 21, 2023	GSM8K	CodeCode Available	0	5
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0	5
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems	Sep 30, 2024	GSM8KMath	CodeCode Available	0	5
Re-Initialization Token Learning for Tool-Augmented Large Language Models	Jun 17, 2025	GSM8KQuestion Answering	CodeCode Available	0	5
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation	Oct 17, 2024	GSM8KLanguage Modeling	CodeCode Available	0	5
Activation Steering for Chain-of-Thought Compression	Jul 7, 2025	GSM8KMath	CodeCode Available	0	5
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models	Apr 3, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0	5
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available	0	5
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation	Feb 28, 2025	GSM8K	CodeCode Available	0	5

Show:10 25 50

← PrevPage 17 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified