SOTAVerified

GSM8K

Papers

Showing 276300 of 439 papers

TitleStatusHype
MAmmoTH2: Scaling Instructions from the Web0
MathAttack: Attacking Large Language Models Towards Math Solving Ability0
MathDivide: Improved mathematical reasoning by large language models0
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task0
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs0
Maximizing Confidence Alone Improves Reasoning0
Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients0
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving0
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs0
MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time0
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs0
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections0
Multi-Reference Preference Optimization for Large Language Models0
Multi-step Problem Solving Through a Verifier: An Empirical Analysis on Model-induced Process Supervision0
Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning0
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function0
Nudging: Inference-time Alignment of LLMs via Guided Decoding0
On Designing Effective RL Reward at Training Time for LLM Reasoning0
Making Large Language Models Better Reasoners with Step-Aware Verifier0
Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation0
Orca-Math: Unlocking the potential of SLMs in Grade School Math0
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Patience Is The Key to Large Language Model Reasoning0
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation0
Show:102550
← PrevPage 12 of 18Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified