SOTAVerified

GSM8K

Papers

Showing 431439 of 439 papers

TitleStatusHype
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word ProblemsCode0
DIVE: Diversified Iterative Self-ImprovementCode0
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem SolvingCode0
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
Exploring Equation as a Better Intermediate Meaning Representation for Numerical ReasoningCode0
Distilling Reasoning Capabilities into Smaller Language ModelsCode0
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware BudgetingCode0
Discriminative Policy Optimization for Token-Level Reward ModelsCode0
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy TheoryCode0
Show:102550
← PrevPage 44 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified