GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 439 papers

Title	Date	Tasks	Status	Hype
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement	May 23, 2023	GSM8K	CodeCode Available	1
Solving Math Word Problems by Combining Language Models With Symbolic Solvers	Apr 16, 2023	GSM8KLanguage Modeling	CodeCode Available	1
Boosted Prompt Ensembles for Large Language Models	Apr 12, 2023	GSM8KLanguage Modeling	CodeCode Available	1
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning	Jan 27, 2023	Few-Shot LearningGSM8K	CodeCode Available	1
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions	May 28, 2022	Arithmetic ReasoningEfficient Exploration	CodeCode Available	1
Self-Consistency Improves Chain of Thought Reasoning in Language Models	Mar 21, 2022	ARCArithmetic Reasoning	CodeCode Available	1
GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems	Jul 17, 2025	DiversityGSM8K	—Unverified	0
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression	Jul 16, 2025	GSM8K	CodeCode Available	0
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?	Jul 15, 2025	GSM8KLanguage Modeling	—Unverified	0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs	Jul 8, 2025	GSM8KMath	—Unverified	0
Activation Steering for Chain-of-Thought Compression	Jul 7, 2025	GSM8KMath	CodeCode Available	0
Scaling Speculative Decoding with Lookahead Reasoning	Jun 24, 2025	GPUGSM8K	CodeCode Available	0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models	Jun 23, 2025	Code CompletionGSM8K	—Unverified	0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute	Jun 18, 2025	continuous-controlContinuous Control	—Unverified	0
Excessive Reasoning Attack on Reasoning LLMs	Jun 17, 2025	GSM8K	—Unverified	0
Re-Initialization Token Learning for Tool-Augmented Large Language Models	Jun 17, 2025	GSM8KQuestion Answering	CodeCode Available	0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing	Jun 17, 2025	ARCCoLA	—Unverified	0
LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment	Jun 13, 2025	GSM8KMathematical Reasoning	—Unverified	0
Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty	Jun 12, 2025	GSM8K	—Unverified	0
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models	Jun 12, 2025	GSM8KMathematical Reasoning	—Unverified	0
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling	Jun 12, 2025	GSM8KMath	CodeCode Available	0
Slimming Down LLMs Without Losing Their Minds	Jun 12, 2025	Computational EfficiencyGSM8K	—Unverified	0
Unsupervised Elicitation of Language Models	Jun 11, 2025	GSM8KTruthfulQA	—Unverified	0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search	Jun 10, 2025	GSM8KMath	—Unverified	0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified	0
Text-to-LoRA: Instant Transformer Adaption	Jun 6, 2025	ARCGSM8K	CodeCode Available	0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers	Jun 5, 2025	GSM8KMath	—Unverified	0
Evaluation of LLMs for mathematical problem solving	May 30, 2025	GSM8KMathematical Problem-Solving	—Unverified	0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections	May 30, 2025	Adversarial Robustnessfeature selection	—Unverified	0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation	May 29, 2025	GSM8KMath	—Unverified	0
Discriminative Policy Optimization for Token-Level Reward Models	May 29, 2025	GSM8KLanguage Modeling	CodeCode Available	0
Maximizing Confidence Alone Improves Reasoning	May 28, 2025	GSM8KMath	—Unverified	0
CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models	May 28, 2025	GSM8K	—Unverified	0
The Price of Format: Diversity Collapse in LLMs	May 25, 2025	DiversityGSM8K	CodeCode Available	0
Efficient Data Selection at Scale via Influence Distillation	May 25, 2025	GSM8KMMLU	—Unverified	0
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models	May 25, 2025	GSM8KHumanEval	—Unverified	0
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts	May 25, 2025	GSM8K	—Unverified	0
Steering LLM Reasoning Through Bias-Only Adaptation	May 24, 2025	GSM8KMath	—Unverified	0
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting	May 24, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available	0
PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models	May 22, 2025	GSM8KLarge Language Model	—Unverified	0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision	May 21, 2025	GSM8KLearning-To-Rank	—Unverified	0
Dual Decomposition of Weights and Singular Value Low Rank Adaptation	May 20, 2025	GSM8KMMLU	—Unverified	0
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models	May 20, 2025	GSM8KMathematical Reasoning	—Unverified	0
Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst	May 20, 2025	ARCGSM8K	—Unverified	0
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs	May 19, 2025	GSM8K	—Unverified	0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified	0
Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping	May 13, 2025	Domain GeneralizationGSM8K	—Unverified	0
AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection	May 12, 2025	GSM8KHumanEval	—Unverified	0

Show:10 25 50

← PrevPage 4 of 9Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified