SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 171–180 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Adaptive Rectification Sampling for Test-Time Compute Scaling	Apr 2, 2025	GSM8KLogical Reasoning	CodeCode Available	0	5
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning	Sep 16, 2023	Date UnderstandingGSM8K	CodeCode Available	0	5
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models	Apr 3, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0	5
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Feb 20, 2025	GSM8KMath	CodeCode Available	0	5
Re-Initialization Token Learning for Tool-Augmented Large Language Models	Jun 17, 2025	GSM8KQuestion Answering	CodeCode Available	0	5
Scaling Speculative Decoding with Lookahead Reasoning	Jun 24, 2025	GPUGSM8K	CodeCode Available	0	5
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Feb 16, 2025	Computational EfficiencyGSM8K	CodeCode Available	0	5
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0	5
Can LLMs Reason in the Wild with Programs?	Jun 19, 2024	GSM8KMath	CodeCode Available	0	5
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models	Feb 20, 2025	GSM8KNatural Language Understanding	CodeCode Available	0	5

Show:10 25 50

← PrevPage 18 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified