SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 391–400 of 439 papers

Title	Date	Tasks	Status	Hype	Score
The ART of LLM Refinement: Ask, Refine, and Trust	Nov 14, 2023	Arithmetic ReasoningGSM8K	—Unverified	0	0
When is the consistent prediction likely to be a correct prediction?	Jul 8, 2024	GSM8KPrediction	—Unverified	0	0
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning	Feb 16, 2025	GSM8K	—Unverified	0	0
The Role of Deductive and Inductive Reasoning in Large Language Models	Oct 3, 2024	GSM8K	—Unverified	0	0
The Unreasonable Effectiveness of Eccentric Automatic Prompts	Feb 9, 2024	Arithmetic ReasoningGSM8K	—Unverified	0	0
Think before you speak: Training Language Models With Pause Tokens	Oct 3, 2023	DecoderGSM8K	—Unverified	0	0
Think Beyond Size: Adaptive Prompting for More Effective Reasoning	Oct 10, 2024	Arithmetic ReasoningComputational Efficiency	—Unverified	0	0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers	Jun 5, 2025	GSM8KMath	—Unverified	0	0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs	Aug 18, 2024	DiversityGPU	—Unverified	0	0
TinyGSM: achieving >80% on GSM8k with small language models	Dec 14, 2023	Arithmetic ReasoningGSM8K	—Unverified	0	0

Show:10 25 50

← PrevPage 40 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified