SOTAVerified

GSM8K

Papers

Showing 4150 of 439 papers

TitleStatusHype
Synthetic Data RL: Task Definition Is All You NeedCode2
SLOT: Sample-specific Language Model Optimization at Test-timeCode2
Dynamic Early Exit in Reasoning ModelsCode2
SEAL: Steerable Reasoning Calibration of Large Language Models for FreeCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language ModelsCode2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
SIFT: Grounding LLM Reasoning in Contexts via StickersCode2
CoT-Valve: Length-Compressible Chain-of-Thought TuningCode2
Natural Language Fine-TuningCode2
Show:102550
← PrevPage 5 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified