SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 221–230 of 439 papers

Title	Date	Tasks	Status	Hype
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search	Jun 10, 2025	GSM8KMath	—Unverified	0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation	Apr 16, 2025	GSM8KMath	—Unverified	0
Evaluation of LLMs for mathematical problem solving	May 30, 2025	GSM8KMathematical Problem-Solving	—Unverified	0
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning	Dec 5, 2024	Few-Shot LearningGSM8K	—Unverified	0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization	Feb 8, 2025	GSM8KMath	—Unverified	0
Excessive Reasoning Attack on Reasoning LLMs	Jun 17, 2025	GSM8K	—Unverified	0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation	Nov 30, 2022	Code GenerationFew-Shot Learning	—Unverified	0
Exploring an LM to generate Prolog Predicates from Mathematics Questions	Sep 7, 2023	GSM8KLanguage Modeling	—Unverified	0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree	Dec 17, 2024	GSM8KHumanEval	—Unverified	0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty	Jun 12, 2025	GSM8K	—Unverified	0

Show:10 25 50

← PrevPage 23 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified