| System-2 Mathematical Reasoning via Enriched Instruction Tuning | Dec 22, 2024 | ERPGSM8K | —Unverified | 0 |
| Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs | Mar 18, 2025 | GSM8KMath | —Unverified | 0 |
| Teaching Small Language Models to Reason | Dec 16, 2022 | GSM8KKnowledge Distillation | —Unverified | 0 |
| The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback | Oct 31, 2023 | GSM8KMMLU | —Unverified | 0 |
| The ART of LLM Refinement: Ask, Refine, and Trust | Nov 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| The Role of Deductive and Inductive Reasoning in Large Language Models | Oct 3, 2024 | GSM8K | —Unverified | 0 |
| The Unreasonable Effectiveness of Eccentric Automatic Prompts | Feb 9, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Think before you speak: Training Language Models With Pause Tokens | Oct 3, 2023 | DecoderGSM8K | —Unverified | 0 |
| Think Beyond Size: Adaptive Prompting for More Effective Reasoning | Oct 10, 2024 | Arithmetic ReasoningComputational Efficiency | —Unverified | 0 |
| Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs | Aug 18, 2024 | DiversityGPU | —Unverified | 0 |
| TinyGSM: achieving >80% on GSM8k with small language models | Dec 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models | Jul 12, 2024 | GSM8KMath | —Unverified | 0 |
| Towards Multilingual LLM Evaluation for European Languages | Oct 11, 2024 | ARCGSM8K | —Unverified | 0 |
| Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning | Dec 23, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition | Apr 29, 2025 | GSM8KKnowledge Distillation | —Unverified | 0 |
| Training Chain-of-Thought via Latent-Variable Inference | Nov 28, 2023 | GSM8K | —Unverified | 0 |
| Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Dec 4, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Training Large Language Models to Reason via EM Policy Gradient | Apr 24, 2025 | GSM8KMath | —Unverified | 0 |
| Transcending Scaling Laws with 0.1% Extra Compute | Oct 20, 2022 | Arithmetic ReasoningCross-Lingual Question Answering | —Unverified | 0 |
| Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning | Apr 17, 2024 | GSM8KNavigate | —Unverified | 0 |
| TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling | Oct 18, 2024 | Computational EfficiencyGSM8K | —Unverified | 0 |
| Uncertainty Aware Learning for Language Model Alignment | Jun 7, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs | Feb 16, 2025 | GSM8KThompson Sampling | —Unverified | 0 |
| Unlocking Structured Thinking in Language Models with Cognitive Prompting | Oct 3, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Unraveling Arithmetic in Large Language Models: The Role of Algebraic Structures | Nov 25, 2024 | GSM8KMath | —Unverified | 0 |