| Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement | May 23, 2023 | GSM8K | CodeCode Available | 1 |
| Solving Math Word Problems by Combining Language Models With Symbolic Solvers | Apr 16, 2023 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Boosted Prompt Ensembles for Large Language Models | Apr 12, 2023 | GSM8KLanguage Modeling | CodeCode Available | 1 |
| Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning | Jan 27, 2023 | Few-Shot LearningGSM8K | CodeCode Available | 1 |
| Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions | May 28, 2022 | Arithmetic ReasoningEfficient Exploration | CodeCode Available | 1 |
| Self-Consistency Improves Chain of Thought Reasoning in Language Models | Mar 21, 2022 | ARCArithmetic Reasoning | CodeCode Available | 1 |
| GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems | Jul 17, 2025 | DiversityGSM8K | —Unverified | 0 |
| DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression | Jul 16, 2025 | GSM8K | CodeCode Available | 0 |
| KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning? | Jul 15, 2025 | GSM8KLanguage Modeling | —Unverified | 0 |
| CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs | Jul 8, 2025 | GSM8KMath | —Unverified | 0 |
| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 |
| Scaling Speculative Decoding with Lookahead Reasoning | Jun 24, 2025 | GPUGSM8K | CodeCode Available | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 |
| Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute | Jun 18, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| Excessive Reasoning Attack on Reasoning LLMs | Jun 17, 2025 | GSM8K | —Unverified | 0 |
| Re-Initialization Token Learning for Tool-Augmented Large Language Models | Jun 17, 2025 | GSM8KQuestion Answering | CodeCode Available | 0 |
| LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing | Jun 17, 2025 | ARCCoLA | —Unverified | 0 |
| LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment | Jun 13, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty | Jun 12, 2025 | GSM8K | —Unverified | 0 |
| PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models | Jun 12, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Learning a Continue-Thinking Token for Enhanced Test-Time Scaling | Jun 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 |
| Unsupervised Elicitation of Language Models | Jun 11, 2025 | GSM8KTruthfulQA | CodeCode Available | 0 |
| Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search | Jun 10, 2025 | GSM8KMath | —Unverified | 0 |
| Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation | Jun 9, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Text-to-LoRA: Instant Transformer Adaption | Jun 6, 2025 | ARCGSM8K | CodeCode Available | 0 |
| Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers | Jun 5, 2025 | GSM8KMath | —Unverified | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| Model Unlearning via Sparse Autoencoder Subspace Guided Projections | May 30, 2025 | Adversarial Robustnessfeature selection | —Unverified | 0 |
| Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation | May 29, 2025 | GSM8KMath | —Unverified | 0 |
| Discriminative Policy Optimization for Token-Level Reward Models | May 29, 2025 | GSM8KLanguage Modeling | CodeCode Available | 0 |
| Maximizing Confidence Alone Improves Reasoning | May 28, 2025 | GSM8KMath | —Unverified | 0 |
| CoThink: Token-Efficient Reasoning via Instruct Models Guiding Reasoning Models | May 28, 2025 | GSM8K | —Unverified | 0 |
| The Price of Format: Diversity Collapse in LLMs | May 25, 2025 | DiversityGSM8K | CodeCode Available | 0 |
| Efficient Data Selection at Scale via Influence Distillation | May 25, 2025 | GSM8KMMLU | —Unverified | 0 |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | May 25, 2025 | GSM8KHumanEval | —Unverified | 0 |
| System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts | May 25, 2025 | GSM8K | —Unverified | 0 |
| Steering LLM Reasoning Through Bias-Only Adaptation | May 24, 2025 | GSM8KMath | —Unverified | 0 |
| AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting | May 24, 2025 | GSM8KReinforcement Learning (RL) | CodeCode Available | 0 |
| EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | May 22, 2025 | GSM8KMath | CodeCode Available | 0 |
| PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models | May 22, 2025 | GSM8KLarge Language Model | —Unverified | 0 |
| Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision | May 21, 2025 | GSM8KLearning-To-Rank | —Unverified | 0 |
| Dual Decomposition of Weights and Singular Value Low Rank Adaptation | May 20, 2025 | GSM8KMMLU | —Unverified | 0 |
| DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models | May 20, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst | May 20, 2025 | ARCGSM8K | —Unverified | 0 |
| RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs | May 19, 2025 | GSM8K | —Unverified | 0 |
| Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | May 15, 2025 | Code GenerationGSM8K | —Unverified | 0 |
| Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping | May 13, 2025 | Domain GeneralizationGSM8K | —Unverified | 0 |
| AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection | May 12, 2025 | GSM8KHumanEval | —Unverified | 0 |