SOTAVerified

GSM8K

Papers

Showing 201225 of 439 papers

TitleStatusHype
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Cost-Saving LLM Cascades with Early Abstention0
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models0
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks0
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic0
D^2LoRA: Data-Driven LoRA Initialization for Low Resource Tasks0
Dialectical Behavior Therapy Approach to LLM Prompting0
Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models0
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones?0
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models0
DNA 1.0 Technical Report0
Does your data spark joy? Performance gains from domain upsampling at the end of training0
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models0
Dual Decomposition of Weights and Singular Value Low Rank Adaptation0
Dynamic Parallel Tree Search for Efficient LLM Reasoning0
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models0
Efficient Data Selection at Scale via Influence Distillation0
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
Evaluation of LLMs for mathematical problem solving0
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
Show:102550
← PrevPage 9 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified