SOTAVerified

GSM8K

Papers

Showing 351375 of 439 papers

TitleStatusHype
System-2 Mathematical Reasoning via Enriched Instruction Tuning0
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs0
Teaching Small Language Models to Reason0
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback0
The ART of LLM Refinement: Ask, Refine, and Trust0
The Role of Deductive and Inductive Reasoning in Large Language Models0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
Think before you speak: Training Language Models With Pause Tokens0
Think Beyond Size: Adaptive Prompting for More Effective Reasoning0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs0
TinyGSM: achieving >80% on GSM8k with small language models0
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models0
Towards Multilingual LLM Evaluation for European Languages0
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning0
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition0
Training Chain-of-Thought via Latent-Variable Inference0
Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning0
Training Large Language Models to Reason via EM Policy Gradient0
Transcending Scaling Laws with 0.1% Extra Compute0
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning0
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling0
Uncertainty Aware Learning for Language Model Alignment0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
Unlocking Structured Thinking in Language Models with Cognitive Prompting0
Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures0
Show:102550
← PrevPage 15 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified