| Training Chain-of-Thought via Latent-Variable Inference | Nov 28, 2023 | GSM8K | —Unverified | 0 | 0 |
| Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Dec 4, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions | Dec 12, 2024 | GSM8KKnowledge Graphs | —Unverified | 0 | 0 |
| Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision | May 21, 2025 | GSM8KLearning-To-Rank | —Unverified | 0 | 0 |
| Learning to Reason via Self-Iterative Process Feedback for Small Language Models | Dec 11, 2024 | Domain GeneralizationGSM8K | —Unverified | 0 | 0 |
| LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint | Feb 24, 2025 | GSM8K | —Unverified | 0 | 0 |
| LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment | Jun 13, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| Let's Reinforce Step by Step | Nov 10, 2023 | GSM8KLogical Reasoning | —Unverified | 0 | 0 |
| Let's reward step by step: Step-Level reward model as the Navigators for Reasoning | Oct 16, 2023 | Code GenerationGSM8K | —Unverified | 0 | 0 |
| Leveraging Uncertainty Estimation for Efficient LLM Routing | Feb 16, 2025 | GSM8KMMLU | —Unverified | 0 | 0 |