| Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning | Jun 11, 2025 | Image CaptioningMath | CodeCode Available | 2 |
| Resa: Transparent Reasoning Models via SAEs | Jun 11, 2025 | Math | CodeCode Available | 1 |
| ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs | Jun 11, 2025 | Code GenerationDiagnostic | CodeCode Available | 1 |
| TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games | Jun 11, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Reinforce LLM Reasoning through Multi-Agent Reflection | Jun 10, 2025 | MathOut-of-Distribution Generalization | —Unverified | 0 |
| Learning to Reason Across Parallel Samples for LLM Reasoning | Jun 10, 2025 | MathRe-Ranking | —Unverified | 0 |
| LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs | Jun 10, 2025 | Large Language ModelMath | —Unverified | 0 |
| Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search | Jun 10, 2025 | GSM8KMath | —Unverified | 0 |
| SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning | Jun 10, 2025 | Knowledge DistillationMath | CodeCode Available | 1 |
| AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions | Jun 10, 2025 | Math | CodeCode Available | 2 |