SOTAVerified

GSM8K

Papers

Showing 251300 of 439 papers

TitleStatusHype
When is the consistent prediction likely to be a correct prediction?0
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning0
Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping0
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
On Designing Effective RL Reward at Training Time for LLM Reasoning0
Uncertainty Aware Learning for Language Model Alignment0
Making Large Language Models Better Reasoners with Step-Aware Verifier0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty0
Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation0
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree0
Exploring an LM to generate Prolog Predicates from Mathematics Questions0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation0
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Patience Is The Key to Large Language Model Reasoning0
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation0
Pheromone-based Learning of Optimal Reasoning Paths0
Excessive Reasoning Attack on Reasoning LLMs0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models0
PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning0
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches0
PORT: Preference Optimization on Reasoning Traces0
Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency0
Predicting Emergent Capabilities by Finetuning0
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning0
Premise Order Matters in Reasoning with Large Language Models0
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models0
Evaluation of LLMs for mathematical problem solving0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
Prompt Baking0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Prompt Engineering a Prompt Engineer0
Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression0
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control0
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning0
Quasi-random Multi-Sample Inference for Large Language Models0
Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma20
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks0
Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration0
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth0
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement0
Efficient Data Selection at Scale via Influence Distillation0
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models0
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought0
Show:102550
← PrevPage 6 of 9Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified