SOTAVerified

GSM8K

Papers

Showing 326350 of 439 papers

TitleStatusHype
Self-Evaluation Guided Beam Search for Reasoning0
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models0
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst0
Self-Training Large Language Models for Tool-Use Without Demonstrations0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs0
Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models0
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models0
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On0
Slimming Down LLMs Without Losing Their Minds0
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs0
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning0
Solving math word problems with process- and outcome-based feedback0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Steering LLM Reasoning Through Bias-Only Adaptation0
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation0
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing0
Supervised Optimism Correction: Be Confident When LLMs Are Sure0
Supervisory Prompt Training0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use0
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts0
Show:102550
← PrevPage 14 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified