SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 171–180 of 439 papers

Title	Date	Tasks	Status	Hype
Explicit Knowledge Transfer for Weakly-Supervised Code Generation	Nov 30, 2022	Code GenerationFew-Shot Learning	—Unverified	0
Contrastive Decoding Improves Reasoning in Large Language Models	Sep 17, 2023	GSM8KHellaSwag	—Unverified	0
Excessive Reasoning Attack on Reasoning LLMs	Jun 17, 2025	GSM8K	—Unverified	0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization	Feb 8, 2025	GSM8KMath	—Unverified	0
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Jul 29, 2024	GSM8KPrompt Engineering	—Unverified	0
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning	Dec 5, 2024	Few-Shot LearningGSM8K	—Unverified	0
Evaluation of LLMs for mathematical problem solving	May 30, 2025	GSM8KMathematical Problem-Solving	—Unverified	0
Complexity-Based Prompting for Multi-Step Reasoning	Oct 3, 2022	Date UnderstandingGSM8K	—Unverified	0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning	Jun 29, 2024	Binary ClassificationGSM8K	—Unverified	0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation	Apr 16, 2025	GSM8KMath	—Unverified	0

Show:10 25 50

← PrevPage 18 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified