| Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge | Feb 27, 2025 | GSM8KHumanEval | —Unverified | 0 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 |
| Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning | Feb 26, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? | Feb 26, 2025 | GSM8KMMLU | —Unverified | 0 |
| SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models | Feb 25, 2025 | Continual LearningGSM8K | —Unverified | 0 |
| LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint | Feb 24, 2025 | GSM8K | —Unverified | 0 |
| Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | Feb 24, 2025 | GSM8KMath | CodeCode Available | 2 |
| Dynamic Parallel Tree Search for Efficient LLM Reasoning | Feb 22, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 |
| NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models | Feb 20, 2025 | GSM8KNatural Language Understanding | CodeCode Available | 0 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 |
| SIFT: Grounding LLM Reasoning in Contexts via Stickers | Feb 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education | Feb 19, 2025 | DiagnosticGSM8K | —Unverified | 0 |
| TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation | Feb 19, 2025 | Dataset GenerationGSM8K | CodeCode Available | 0 |
| Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models | Feb 18, 2025 | Data AugmentationGSM8K | —Unverified | 0 |
| SMART: Self-Aware Agent for Tool Overuse Mitigation | Feb 17, 2025 | GSM8KLarge Language Model | CodeCode Available | 1 |
| MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task | Feb 17, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| TokenSkip: Controllable Chain-of-Thought Compression in LLMs | Feb 17, 2025 | GSM8K | CodeCode Available | 3 |
| Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs | Feb 16, 2025 | GSM8KThompson Sampling | —Unverified | 0 |
| Balancing the Budget: Understanding Trade-offs Between Supervised and Preference-Based Finetuning | Feb 16, 2025 | GSM8K | —Unverified | 0 |
| Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Feb 16, 2025 | Computational EfficiencyGSM8K | CodeCode Available | 0 |
| Leveraging Uncertainty Estimation for Efficient LLM Routing | Feb 16, 2025 | GSM8KMMLU | —Unverified | 0 |
| Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization | Feb 14, 2025 | GSM8KInference Optimization | —Unverified | 0 |
| CoT-Valve: Length-Compressible Chain-of-Thought Tuning | Feb 13, 2025 | GSM8K | CodeCode Available | 2 |
| Cost-Saving LLM Cascades with Early Abstention | Feb 13, 2025 | GSM8KMMLU | —Unverified | 0 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| Self-Training Large Language Models for Tool-Use Without Demonstrations | Feb 9, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting | Feb 5, 2025 | GSM8KMath | CodeCode Available | 0 |
| BARE: Leveraging Base Language Models for Few-Shot Synthetic Data Generation | Feb 3, 2025 | DiversityGSM8K | —Unverified | 0 |
| ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | Feb 1, 2025 | GPUGSM8K | —Unverified | 0 |
| Pheromone-based Learning of Optimal Reasoning Paths | Jan 31, 2025 | ARCGSM8K | —Unverified | 0 |
| RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations | Jan 25, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 |
| Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs | Jan 21, 2025 | GSM8KIn-Context Learning | —Unverified | 0 |
| MyGO Multiplex CoT: A Method for Self-Reflection in Large Language Models via Double Chain of Thought Thinking | Jan 20, 2025 | Decision MakingGSM8K | CodeCode Available | 1 |
| DNA 1.0 Technical Report | Jan 18, 2025 | BelebeleGSM8K | —Unverified | 0 |
| ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | Jan 14, 2025 | GSM8KMath | CodeCode Available | 0 |
| DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory | Jan 11, 2025 | GSM8KQuantization | CodeCode Available | 0 |
| Semantic Exploration with Adaptive Gating for Efficient Problem Solving with Language Models | Jan 10, 2025 | ARCDiversity | —Unverified | 0 |
| InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion | Jan 6, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models | Jan 3, 2025 | GSM8KMath | —Unverified | 0 |
| DIVE: Diversified Iterative Self-Improvement | Jan 1, 2025 | DiversityGSM8K | CodeCode Available | 0 |
| Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs | Dec 30, 2024 | GSM8K | —Unverified | 0 |
| LLM2: Let Large Language Models Harness System 2 Reasoning | Dec 29, 2024 | GSM8KMathematical Reasoning | CodeCode Available | 0 |
| Natural Language Fine-Tuning | Dec 29, 2024 | GSM8KLarge Language Model | CodeCode Available | 2 |
| Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning | Dec 23, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem Solutions | Dec 22, 2024 | GSM8KMath | —Unverified | 0 |
| System-2 Mathematical Reasoning via Enriched Instruction Tuning | Dec 22, 2024 | ERPGSM8K | —Unverified | 0 |
| Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving | Dec 20, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 0 |
| Offline Reinforcement Learning for LLM Multi-Step Reasoning | Dec 20, 2024 | GSM8KMath | CodeCode Available | 2 |