| Steering LLM Reasoning Through Bias-Only Adaptation | May 24, 2025 | GSM8KMath | —Unverified | 0 |
| Anchored Diffusion Language Model | May 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization | May 24, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors | May 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs | May 23, 2025 | Cross-Lingual TransferMath | —Unverified | 0 |
| Outcome-based Reinforcement Learning to Predict the Future | May 23, 2025 | Holdout SetMath | —Unverified | 0 |
| VideoGameBench: Can Vision-Language Models complete popular video games? | May 23, 2025 | Math | —Unverified | 0 |
| One RL to See Them All: Visual Triple Unified Reinforcement Learning | May 23, 2025 | AllMath | —Unverified | 0 |
| More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models | May 23, 2025 | DiagnosticHallucination | —Unverified | 0 |
| AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning | May 22, 2025 | Mathreinforcement-learning | —Unverified | 0 |