SOTAVerified

GSM8K

Papers

Showing 391400 of 439 papers

TitleStatusHype
The ART of LLM Refinement: Ask, Refine, and Trust0
When is the consistent prediction likely to be a correct prediction?0
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning0
The Role of Deductive and Inductive Reasoning in Large Language Models0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
Think before you speak: Training Language Models With Pause Tokens0
Think Beyond Size: Adaptive Prompting for More Effective Reasoning0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs0
TinyGSM: achieving >80% on GSM8k with small language models0
Show:102550
← PrevPage 40 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified