SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 181–190 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Feb 16, 2025	Computational EfficiencyGSM8K	CodeCode Available	0	5
Can LLMs Reason in the Wild with Programs?	Jun 19, 2024	GSM8KMath	CodeCode Available	0	5
DIVE: Diversified Iterative Self-Improvement	Jan 1, 2025	DiversityGSM8K	CodeCode Available	0	5
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0	5
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving	Jan 14, 2025	GSM8KMath	CodeCode Available	0	5
Distilling Reasoning Capabilities into Smaller Language Models	Dec 1, 2022	GSM8KKnowledge Distillation	CodeCode Available	0	5
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning	Oct 16, 2024	AllGSM8K	CodeCode Available	0	5
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models	Feb 20, 2025	GSM8KNatural Language Understanding	CodeCode Available	0	5
Discriminative Policy Optimization for Token-Level Reward Models	May 29, 2025	GSM8KLanguage Modeling	CodeCode Available	0	5
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory	Jan 11, 2025	GSM8KQuantization	CodeCode Available	0	5

Show:10 25 50

← PrevPage 19 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified