SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 381–390 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use	Apr 7, 2025	GSM8KMath	—Unverified	0	0
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning	Dec 14, 2023	Arithmetic ReasoningFew-Shot Learning	—Unverified	0	0
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts	May 25, 2025	GSM8K	—Unverified	0	0
System-2 Mathematical Reasoning via Enriched Instruction Tuning	Dec 22, 2024	ERPGSM8K	—Unverified	0	0
BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation	Feb 3, 2025	DiversityGSM8K	—Unverified	0	0
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs	Mar 18, 2025	GSM8KMath	—Unverified	0	0
Teaching Small Language Models to Reason	Dec 16, 2022	GSM8KKnowledge Distillation	—Unverified	0	0
Adaptive Decoding via Latent Preference Optimization	Nov 14, 2024	GSM8KInstruction Following	—Unverified	0	0
Adapting LLM Agents with Universal Feedback in Communication	Oct 1, 2023	Decision MakingGSM8K	—Unverified	0	0
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback	Oct 31, 2023	GSM8KMMLU	—Unverified	0	0

Show:10 25 50

← PrevPage 39 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified