| DRLE: Decentralized Reinforcement Learning at the Edge for Traffic Light Control in the IoV | Sep 3, 2020 | Edge-computingManagement | CodeCode Available | 2 |
| Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination | Jul 14, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Jul 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization | Jul 8, 2025 | Active LearningAutomated Theorem Proving | CodeCode Available | 1 |
| Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team | Jun 17, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| RePO: Replay-Enhanced Policy Optimization | Jun 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning | Jun 9, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Safe: Enhancing Mathematical Reasoning in Large Language Models via Retrospective Step-aware Formal Verification | Jun 5, 2025 | Automated Theorem ProvingHallucination | CodeCode Available | 1 |
| The Hallucination Dilemma: Factuality-Aware Reinforcement Learning for Large Reasoning Models | May 30, 2025 | HallucinationMathematical Reasoning | CodeCode Available | 1 |
| Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM Reasoning | May 30, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |