SOTAVerified

GSM8K

Papers

Showing 426439 of 439 papers

TitleStatusHype
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination EvaluationCode0
TutorGym: A Testbed for Evaluating AI Agents as Tutors and StudentsCode0
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning PerspectiveCode0
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM DeploymentCode0
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsCode0
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word ProblemsCode0
DIVE: Diversified Iterative Self-ImprovementCode0
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem SolvingCode0
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
Exploring Equation as a Better Intermediate Meaning Representation for Numerical ReasoningCode0
Distilling Reasoning Capabilities into Smaller Language ModelsCode0
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware BudgetingCode0
Discriminative Policy Optimization for Token-Level Reward ModelsCode0
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy TheoryCode0
Show:102550
← PrevPage 18 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified