| ReTool: Reinforcement Learning for Strategic Tool Use in LLMs | Apr 15, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Heimdall: test-time scaling on the generative verification | Apr 14, 2025 | Math | —Unverified | 0 |
| Supervised Optimism Correction: Be Confident When LLMs Are Sure | Apr 10, 2025 | GSM8KMath | —Unverified | 0 |
| GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable | Apr 10, 2025 | GPUMath | —Unverified | 0 |
| MDIT: A Model-free Data Interpolation Method for Diverse Instruction Tuning | Apr 9, 2025 | Code GenerationDiversity | —Unverified | 0 |
| Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | Apr 7, 2025 | GSM8KMath | —Unverified | 0 |
| Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification | Apr 7, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning | Apr 6, 2025 | Math | —Unverified | 0 |
| oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science | Apr 5, 2025 | Math | —Unverified | 0 |
| Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning | Apr 4, 2025 | Mathreinforcement-learning | —Unverified | 0 |