SOTAVerified

GSM8K

Papers

Showing 251275 of 439 papers

TitleStatusHype
A Careful Examination of Large Language Model Performance on Grade School Arithmetic0
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning0
Uncertainty Aware Learning for Language Model Alignment0
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
Fine-Grained Self-Endorsement Improves Factuality and Reasoning0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
On Designing Effective RL Reward at Training Time for LLM Reasoning0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
Making Large Language Models Better Reasoners with Step-Aware Verifier0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty0
Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation0
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree0
Exploring an LM to generate Prolog Predicates from Mathematics Questions0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation0
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Patience Is The Key to Large Language Model Reasoning0
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation0
Pheromone-based Learning of Optimal Reasoning Paths0
Excessive Reasoning Attack on Reasoning LLMs0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models0
Show:102550
← PrevPage 11 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified