| MathArena: Evaluating LLMs on Uncontaminated Math Competitions | May 29, 2025 | MathMathematical Reasoning | CodeCode Available | 3 |
| LLM Performance for Code Generation on Noisy Tasks | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| DINGO: Constrained Inference for Diffusion LLMs | May 29, 2025 | Math | —Unverified | 0 |
| Maximizing Confidence Alone Improves Reasoning | May 28, 2025 | GSM8KMath | —Unverified | 0 |
| Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 |
| Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start | May 28, 2025 | MathMultimodal Reasoning | CodeCode Available | 1 |
| Skywork Open Reasoner 1 Technical Report | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 4 |
| ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark | May 28, 2025 | Math | CodeCode Available | 0 |
| Decomposing Elements of Problem Solving: What "Math" Does RL Teach? | May 28, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge | May 28, 2025 | Imitation LearningMath | CodeCode Available | 1 |