GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 439 papers

Title	Date	Tasks	Status	Hype
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties	Jun 6, 2025	GSM8K	CodeCode Available	1
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers	Jun 5, 2025	GSM8KMath	—Unverified	0
Evaluation of LLMs for mathematical problem solving	May 30, 2025	GSM8KMathematical Problem-Solving	—Unverified	0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections	May 30, 2025	Adversarial Robustnessfeature selection	—Unverified	0
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models	May 29, 2025	2k4k	CodeCode Available	1
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation	May 29, 2025	GSM8KMath	—Unverified	0
Discriminative Policy Optimization for Token-Level Reward Models	May 29, 2025	GSM8KLanguage Modeling	CodeCode Available	0
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models	May 28, 2025	GSM8K	—Unverified	0
Maximizing Confidence Alone Improves Reasoning	May 28, 2025	GSM8KMath	—Unverified	0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	May 25, 2025	GSM8KHumanEval	—Unverified	0
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts	May 25, 2025	GSM8K	—Unverified	0
The Price of Format: Diversity Collapse in LLMs	May 25, 2025	DiversityGSM8K	CodeCode Available	0
Efficient Data Selection at Scale via Influence Distillation	May 25, 2025	GSM8KMMLU	—Unverified	0
Steering LLM Reasoning Through Bias-Only Adaptation	May 24, 2025	GSM8KMath	—Unverified	0
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting	May 24, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models	May 22, 2025	GSM8KLarge Language Model	—Unverified	0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available	0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision	May 21, 2025	GSM8KLearning-To-Rank	—Unverified	0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst	May 20, 2025	ARCGSM8K	—Unverified	0
Dual Decomposition of Weights and Singular Value Low Rank Adaptation	May 20, 2025	GSM8KMMLU	—Unverified	0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models	May 20, 2025	GSM8KMathematical Reasoning	—Unverified	0
Let LLMs Break Free from Overthinking via Self-Braking Tuning	May 20, 2025	GSM8K	CodeCode Available	2
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs	May 19, 2025	GSM8K	—Unverified	0
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space	May 19, 2025	GSM8KMath	CodeCode Available	2
Thinkless: LLM Learns When to Think	May 19, 2025	GSM8KMath	CodeCode Available	3

Show:10 25 50

← PrevPage 2 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified