SOTAVerified

GSM8K

Papers

Showing 351400 of 439 papers

TitleStatusHype
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models0
Uncovering Latent Chain of Thought Vectors in Language Models0
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models0
Cool-Fusion: Fuse Large Language Models without Training0
ControlMath: Controllable Data Generation Promotes Math Generalist Models0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On0
Slimming Down LLMs Without Losing Their Minds0
Contrastive Decoding Improves Reasoning in Large Language Models0
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment0
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs0
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning0
Complexity-Based Prompting for Multi-Step Reasoning0
Solving math word problems with process- and outcome-based feedback0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning0
Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Steering LLM Reasoning Through Bias-Only Adaptation0
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference0
Can Separators Improve Chain-of-Thought Prompting?0
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation0
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing0
Building Math Agents with Multi-Turn Iterative Preference Learning0
BrainTransformers: SNN-LLM0
Supervised Optimism Correction: Be Confident When LLMs Are Sure0
Supervisory Prompt Training0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use0
Fewer is More: Boosting LLM Reasoning with Reinforced Context Pruning0
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts0
System-2 Mathematical Reasoning via Enriched Instruction Tuning0
BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation0
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs0
Teaching Small Language Models to Reason0
Adaptive Decoding via Latent Preference Optimization0
Adapting LLM Agents with Universal Feedback in Communication0
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback0
The ART of LLM Refinement: Ask, Refine, and Trust0
When is the consistent prediction likely to be a correct prediction?0
Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning0
The Role of Deductive and Inductive Reasoning in Large Language Models0
The Unreasonable Effectiveness of Eccentric Automatic Prompts0
Think before you speak: Training Language Models With Pause Tokens0
Think Beyond Size: Adaptive Prompting for More Effective Reasoning0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers0
Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs0
TinyGSM: achieving >80% on GSM8k with small language models0
Show:102550
← PrevPage 8 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified