| RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning | May 20, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| The Hallucination Tax of Reinforcement Finetuning | May 20, 2025 | HallucinationMath | —Unverified | 0 |
| Let's Verify Math Questions Step by Step | May 20, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning | May 20, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 1 |
| General-Reasoner: Advancing LLM Reasoning Across All Domains | May 20, 2025 | AllMath | CodeCode Available | 3 |
| Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings | May 19, 2025 | HumanEvalMath | CodeCode Available | 0 |
| Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | May 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database | May 19, 2025 | Data AugmentationIn-Context Learning | —Unverified | 0 |
| AdaptThink: Reasoning Models Can Learn When to Think | May 19, 2025 | Math | CodeCode Available | 2 |
| Thinkless: LLM Learns When to Think | May 19, 2025 | GSM8KMath | CodeCode Available | 3 |