| Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast | May 23, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 1 | 5 |
| LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization | Oct 27, 2024 | GSM8KHellaSwag | CodeCode Available | 1 | 5 |
| Markovian Transformers for Informative Language Modeling | Apr 29, 2024 | GSM8KInformativeness | CodeCode Available | 1 | 5 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 | 5 |
| Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations | Dec 14, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 | 5 |
| UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts | Nov 11, 2024 | Code GenerationGSM8K | CodeCode Available | 1 | 5 |
| Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems | Oct 3, 2023 | GSM8KMath | CodeCode Available | 0 | 5 |
| COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement | Oct 12, 2024 | Code GenerationComputational Efficiency | CodeCode Available | 0 | 5 |
| metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models | Jul 4, 2024 | ARCGSM8K | CodeCode Available | 0 | 5 |
| The Price of Format: Diversity Collapse in LLMs | May 25, 2025 | DiversityGSM8K | CodeCode Available | 0 | 5 |
| Exploring LLM Reasoning Through Controlled Prompt Variations | Apr 2, 2025 | GSM8KMathematical Problem-Solving | CodeCode Available | 0 | 5 |
| Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning | Aug 21, 2023 | GSM8K | CodeCode Available | 0 | 5 |
| AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need | Jun 18, 2025 | GSM8KHumanEval | CodeCode Available | 0 | 5 |
| Activation Steering for Chain-of-Thought Compression | Jul 7, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations | Nov 22, 2023 | Common Sense ReasoningGSM8K | CodeCode Available | 0 | 5 |
| Text-to-LoRA: Instant Transformer Adaption | Jun 6, 2025 | ARCGSM8K | CodeCode Available | 0 | 5 |
| EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | May 22, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| SMART: Self-learning Meta-strategy Agent for Reasoning Tasks | Oct 21, 2024 | GSM8KSelf-Learning | CodeCode Available | 0 | 5 |
| CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation | Feb 28, 2025 | GSM8K | CodeCode Available | 0 | 5 |
| Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting | Dec 18, 2024 | GSM8KKnowledge Distillation | CodeCode Available | 0 | 5 |
| Adaptive Rectification Sampling for Test-Time Compute Scaling | Apr 2, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 | 5 |
| EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning | Sep 16, 2023 | Date UnderstandingGSM8K | CodeCode Available | 0 | 5 |
| ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation | Jun 16, 2024 | Continual LearningGSM8K | CodeCode Available | 0 | 5 |
| LLM2: Let Large Language Models Harness System 2 Reasoning | Dec 29, 2024 | GSM8KMathematical Reasoning | CodeCode Available | 0 | 5 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving | Oct 19, 2023 | GSM8KMath | CodeCode Available | 0 | 5 |
| SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation | Oct 17, 2024 | GSM8KLanguage Modeling | CodeCode Available | 0 | 5 |
| Re-Initialization Token Learning for Tool-Augmented Large Language Models | Jun 17, 2025 | GSM8KQuestion Answering | CodeCode Available | 0 | 5 |
| Learning a Continue-Thinking Token for Enhanced Test-Time Scaling | Jun 12, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Feb 16, 2025 | Computational EfficiencyGSM8K | CodeCode Available | 0 | 5 |
| Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models | Apr 3, 2025 | GSM8KReinforcement Learning (RL) | CodeCode Available | 0 | 5 |
| Can LLMs Reason in the Wild with Programs? | Jun 19, 2024 | GSM8KMath | CodeCode Available | 0 | 5 |
| Scaling Speculative Decoding with Lookahead Reasoning | Jun 24, 2025 | GPUGSM8K | CodeCode Available | 0 | 5 |
| DIVE: Diversified Iterative Self-Improvement | Jan 1, 2025 | DiversityGSM8K | CodeCode Available | 0 | 5 |
| ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving | Jan 14, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |
| Distilling Reasoning Capabilities into Smaller Language Models | Dec 1, 2022 | GSM8KKnowledge Distillation | CodeCode Available | 0 | 5 |
| One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks | Oct 14, 2024 | FairnessGSM8K | CodeCode Available | 0 | 5 |
| Discriminative Policy Optimization for Token-Level Reward Models | May 29, 2025 | GSM8KLanguage Modeling | CodeCode Available | 0 | 5 |
| DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory | Jan 11, 2025 | GSM8KQuantization | CodeCode Available | 0 | 5 |
| PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning | May 23, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 0 | 5 |
| Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems | May 24, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 0 | 5 |
| Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation | Jun 20, 2024 | GSM8KLanguage Model Evaluation | CodeCode Available | 0 | 5 |
| DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability | Mar 4, 2025 | GSM8KLogical Reasoning | CodeCode Available | 0 | 5 |
| Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving | Dec 20, 2024 | Computational EfficiencyGSM8K | CodeCode Available | 0 | 5 |
| In-Context Principle Learning from Mistakes | Feb 8, 2024 | GSM8KIn-Context Learning | CodeCode Available | 0 | 5 |
| A mixed policy to improve performance of language models on math problems | Jul 17, 2023 | GSM8KMath | CodeCode Available | 0 | 5 |
| How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective | Oct 14, 2024 | Density Ratio EstimationGSM8K | CodeCode Available | 0 | 5 |
| DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression | Jul 16, 2025 | GSM8K | CodeCode Available | 0 | 5 |
| NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models | Feb 20, 2025 | GSM8KNatural Language Understanding | CodeCode Available | 0 | 5 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 | 5 |