| System-2 Mathematical Reasoning via Enriched Instruction Tuning | Dec 22, 2024 | ERPGSM8K | —Unverified | 0 |
| Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs | Mar 18, 2025 | GSM8KMath | —Unverified | 0 |
| Teaching Small Language Models to Reason | Dec 16, 2022 | GSM8KKnowledge Distillation | —Unverified | 0 |
| The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback | Oct 31, 2023 | GSM8KMMLU | —Unverified | 0 |
| The ART of LLM Refinement: Ask, Refine, and Trust | Nov 14, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| The Role of Deductive and Inductive Reasoning in Large Language Models | Oct 3, 2024 | GSM8K | —Unverified | 0 |
| The Unreasonable Effectiveness of Eccentric Automatic Prompts | Feb 9, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 |
| Think before you speak: Training Language Models With Pause Tokens | Oct 3, 2023 | DecoderGSM8K | —Unverified | 0 |
| Think Beyond Size: Adaptive Prompting for More Effective Reasoning | Oct 10, 2024 | Arithmetic ReasoningComputational Efficiency | —Unverified | 0 |
| Threshold Filtering Packing for Supervised Fine-Tuning: Training Related Samples within Packs | Aug 18, 2024 | DiversityGPU | —Unverified | 0 |