SOTAVerified

GSM8K

Papers

Showing 4150 of 439 papers

TitleStatusHype
Offline Reinforcement Learning for LLM Multi-Step ReasoningCode2
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical TextsCode2
Meta Prompting for AI SystemsCode2
Exploring the Compositional Deficiency of Large Language Models in Mathematical ReasoningCode2
Natural Language Fine-TuningCode2
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning ProcessCode2
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language ModelsCode2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
Dynamic Early Exit in Reasoning ModelsCode2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of ParametersCode2
Show:102550
← PrevPage 5 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified