SOTAVerified

GSM8K

Papers

Showing 2650 of 439 papers

TitleStatusHype
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMsCode3
Automatic Instruction Evolving for Large Language ModelsCode3
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by StepCode3
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical ReasoningCode3
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference LearningCode3
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language ModelsCode3
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible PipelineCode3
SkyMath: Technical ReportCode3
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language ModelsCode3
PAL: Program-aided Language ModelsCode3
Training Verifiers to Solve Math Word ProblemsCode3
any4: Learned 4-bit Numeric Representation for LLMsCode2
Let LLMs Break Free from Overthinking via Self-Braking TuningCode2
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent SpaceCode2
Synthetic Data RL: Task Definition Is All You NeedCode2
SLOT: Sample-specific Language Model Optimization at Test-timeCode2
Dynamic Early Exit in Reasoning ModelsCode2
SEAL: Steerable Reasoning Calibration of Large Language Models for FreeCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language ModelsCode2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
SIFT: Grounding LLM Reasoning in Contexts via StickersCode2
CoT-Valve: Length-Compressible Chain-of-Thought TuningCode2
Natural Language Fine-TuningCode2
Show:102550
← PrevPage 2 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified