SOTAVerified

GSM8K

Papers

Showing 5175 of 439 papers

TitleStatusHype
Exploring the Compositional Deficiency of Large Language Models in Mathematical ReasoningCode2
Let LLMs Break Free from Overthinking via Self-Braking TuningCode2
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math ReasoningCode2
Preference Optimization for Reasoning with Pseudo FeedbackCode2
Large Language Models are Zero-Shot ReasonersCode2
ProcessBench: Identifying Process Errors in Mathematical ReasoningCode2
Dynamic Early Exit in Reasoning ModelsCode2
Progressive-Hint Prompting Improves Reasoning in Large Language ModelsCode2
any4: Learned 4-bit Numeric Representation for LLMsCode2
Offline Reinforcement Learning for LLM Multi-Step ReasoningCode2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language ModelsCode2
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning ProcessCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
Meta Prompting for AI SystemsCode2
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-RewardingCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
CoT-Valve: Length-Compressible Chain-of-Thought TuningCode2
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free LunchCode2
Balancing LoRA Performance and Efficiency with Simple Shard SharingCode2
Natural Language Fine-TuningCode2
Language Models are Multilingual Chain-of-Thought ReasonersCode2
Show:102550
← PrevPage 3 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified