GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–375 of 439 papers

Title	Date	Tasks	Status
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models	May 12, 2025	GSM8KLarge Language Model	—Unverified
Uncovering Latent Chain of Thought Vectors in Language Models	Sep 21, 2024	ARCGSM8K	—Unverified
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models	Aug 28, 2024	Data AugmentationGSM8K	—Unverified
Cool-Fusion: Fuse Large Language Models without Training	Jul 29, 2024	Combinatorial OptimizationGSM8K	—Unverified
ControlMath: Controllable Data Generation Promotes Math Generalist Models	Sep 20, 2024	Data AugmentationDiversity	—Unverified
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On	Jul 11, 2024	GSM8KMath	—Unverified
Slimming Down LLMs Without Losing Their Minds	Jun 12, 2025	Computational EfficiencyGSM8K	—Unverified
Contrastive Decoding Improves Reasoning in Large Language Models	Sep 17, 2023	GSM8KHellaSwag	—Unverified
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost	Jul 29, 2024	GSM8KPrompt Engineering	—Unverified
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment	Oct 23, 2024	GSM8KHumanEval	—Unverified
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs	Dec 11, 2024	ARCGSM8K	—Unverified
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning	Mar 6, 2025	GSM8KMath	—Unverified
Complexity-Based Prompting for Multi-Step Reasoning	Oct 3, 2022	Date UnderstandingGSM8K	—Unverified
Solving math word problems with process- and outcome-based feedback	Nov 25, 2022	Arithmetic ReasoningGSM8K	—Unverified
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning	Oct 3, 2024	GSM8KLanguage Modeling	—Unverified
Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning	Feb 26, 2025	GSM8KMathematical Reasoning	—Unverified
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths	May 30, 2024	GSM8KHumanEval	—Unverified
Steering LLM Reasoning Through Bias-Only Adaptation	May 24, 2025	GSM8KMath	—Unverified
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference	Feb 1, 2025	GPUGSM8K	—Unverified
Can Separators Improve Chain-of-Thought Prompting?	Feb 16, 2024	8kGSM8K	—Unverified
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation	Sep 5, 2024	GSM8K	—Unverified
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation	May 29, 2025	GSM8KMath	—Unverified
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Sep 10, 2024	GSM8KMixture-of-Experts	—Unverified
Subtle Errors Matter: Preference Learning via Error-injected Self-editing	Oct 9, 2024	GSM8KMath	—Unverified
Building Math Agents with Multi-Turn Iterative Preference Learning	Sep 4, 2024	GSM8KMath	—Unverified

Show:10 25 50

← PrevPage 15 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified