SOTAVerified

GSM8K

Papers

Showing 6170 of 439 papers

TitleStatusHype
Language Models are Multilingual Chain-of-Thought ReasonersCode2
Natural Language Fine-TuningCode2
Preference Optimization for Reasoning with Pseudo FeedbackCode2
Meta Prompting for AI SystemsCode2
any4: Learned 4-bit Numeric Representation for LLMsCode2
Balancing LoRA Performance and Efficiency with Simple Shard SharingCode2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of ParametersCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Show:102550
← PrevPage 7 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified