SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 71–80 of 439 papers

Title	Date	Tasks	Status	Hype
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation	Apr 16, 2025	GSM8KMath	—Unverified	0
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution	Apr 13, 2025	GSM8KMath	CodeCode Available	3
Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration	Apr 13, 2025	GSM8K	—Unverified	0
Supervised Optimism Correction: Be Confident When LLMs Are Sure	Apr 10, 2025	GSM8KMath	—Unverified	0
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use	Apr 7, 2025	GSM8KMath	—Unverified	0
SEAL: Steerable Reasoning Calibration of Large Language Models for Free	Apr 7, 2025	GSM8K	CodeCode Available	2
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models	Apr 4, 2025	GSM8KMathematical Reasoning	—Unverified	0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Apr 4, 2025	BenchmarkingGSM8K	—Unverified	0
Large (Vision) Language Models are Unsupervised In-Context Learners	Apr 3, 2025	GSM8KIn-Context Learning	CodeCode Available	1
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models	Apr 3, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0

Show:10 25 50

← PrevPage 8 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified