SOTAVerified

GSM8K

Papers

Showing 301350 of 439 papers

TitleStatusHype
Pheromone-based Learning of Optimal Reasoning Paths0
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models0
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning0
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches0
PORT: Preference Optimization on Reasoning Traces0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
Predicting Emergent Capabilities by Finetuning0
Premise Order Matters in Reasoning with Large Language Models0
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models0
Prompt Baking0
Prompt Engineering a Prompt Engineer0
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression0
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning0
Quasi-random Multi-Sample Inference for Large Language Models0
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks0
Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration0
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement0
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought0
ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths0
Reasoning Robustness of LLMs to Adversarial Typographical Errors0
Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models0
Self-Consistency Preference Optimization0
Self-Evaluation Guided Beam Search for Reasoning0
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models0
Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst0
Self-Training Large Language Models for Tool-Use Without Demonstrations0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs0
Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models0
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models0
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On0
Slimming Down LLMs Without Losing Their Minds0
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs0
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning0
Solving math word problems with process- and outcome-based feedback0
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths0
Steering LLM Reasoning Through Bias-Only Adaptation0
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation0
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing0
Supervised Optimism Correction: Be Confident When LLMs Are Sure0
Supervisory Prompt Training0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning0
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use0
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts0
Show:102550
← PrevPage 7 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified