| Synthetic Data RL: Task Definition Is All You Need | May 18, 2025 | AllGSM8K | CodeCode Available | 2 |
| SLOT: Sample-specific Language Model Optimization at Test-time | May 18, 2025 | GSM8KLanguage Modeling | CodeCode Available | 2 |
| Dynamic Early Exit in Reasoning Models | Apr 22, 2025 | GSM8KMath | CodeCode Available | 2 |
| SEAL: Steerable Reasoning Calibration of Large Language Models for Free | Apr 7, 2025 | GSM8K | CodeCode Available | 2 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 |
| Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | Mar 21, 2025 | GSM8KQuestion Answering | CodeCode Available | 2 |
| Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | Feb 24, 2025 | GSM8KMath | CodeCode Available | 2 |
| SIFT: Grounding LLM Reasoning in Contexts via Stickers | Feb 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| CoT-Valve: Length-Compressible Chain-of-Thought Tuning | Feb 13, 2025 | GSM8K | CodeCode Available | 2 |
| Natural Language Fine-Tuning | Dec 29, 2024 | GSM8KLarge Language Model | CodeCode Available | 2 |