| LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception | Apr 21, 2025 | MathMMLU | —Unverified | 0 |
| Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators | Apr 21, 2025 | Code GenerationInstruction Following | CodeCode Available | 0 |
| OTC: Optimal Tool Calls via Reinforcement Learning | Apr 21, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| Learning to Reason under Off-Policy Guidance | Apr 21, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 3 |
| Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction | Apr 21, 2025 | Math | CodeCode Available | 2 |
| Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning | Apr 21, 2025 | AllForm | CodeCode Available | 2 |
| Enhancing Math Learning in an LMS Using AI-Driven Question Recommendations | Apr 18, 2025 | ManagementMath | —Unverified | 0 |
| Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Apr 18, 2025 | MathVisual Reasoning | —Unverified | 0 |
| THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models | Apr 17, 2025 | BenchmarkingMath | —Unverified | 0 |
| MathPhys-Guided Coarse-to-Fine Anomaly Synthesis with SQE-Driven Bi-Level Optimization for Anomaly Detection | Apr 17, 2025 | Anomaly DetectionData Augmentation | —Unverified | 0 |