| Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation | Jun 9, 2025 | GSM8KHumanEval | —Unverified | 0 |
| The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity | Jun 7, 2025 | Math | —Unverified | 0 |
| AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical Search | Jun 6, 2025 | Large Language ModelMath | CodeCode Available | 0 |
| Spectral Derivatives | Jun 6, 2025 | Math | CodeCode Available | 0 |
| SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms | Jun 6, 2025 | DiversityLarge Language Model | —Unverified | 0 |
| Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning | Jun 5, 2025 | MathVisual Grounding | —Unverified | 0 |
| Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers | Jun 5, 2025 | GSM8KMath | —Unverified | 0 |
| Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback | Jun 5, 2025 | Math | —Unverified | 0 |
| Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models | Jun 5, 2025 | AllMath | —Unverified | 0 |
| TreeRPO: Tree Relative Policy Optimization | Jun 5, 2025 | Math | CodeCode Available | 0 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Guided Speculative Inference for Efficient Test-Time Alignment of LLMs | Jun 4, 2025 | Math | CodeCode Available | 0 |
| Rectified Sparse Attention | Jun 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem | Jun 3, 2025 | GPUMath | —Unverified | 0 |
| MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching | Jun 3, 2025 | Data AugmentationInstruction Following | —Unverified | 0 |
| SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis | Jun 2, 2025 | 8kMath | —Unverified | 0 |
| Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-Tuning | Jun 2, 2025 | Machine UnlearningMath | CodeCode Available | 0 |
| Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains | Jun 2, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 |
| Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models | May 30, 2025 | MathMultiple-choice | CodeCode Available | 0 |
| Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking | May 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning | May 30, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation | May 29, 2025 | GSM8KMath | —Unverified | 0 |
| Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability | May 29, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Discriminative Policy Optimization for Token-Level Reward Models | May 29, 2025 | GSM8KLanguage Modeling | CodeCode Available | 0 |
| DINGO: Constrained Inference for Diffusion LLMs | May 29, 2025 | Math | —Unverified | 0 |
| LLM Performance for Code Generation on Noisy Tasks | May 29, 2025 | BenchmarkingCode Generation | CodeCode Available | 0 |
| PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics | May 29, 2025 | Math | —Unverified | 0 |
| Matryoshka Model Learning for Improved Elastic Student Models | May 29, 2025 | LAMBADAMath | —Unverified | 0 |
| Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models | May 29, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Decomposing Elements of Problem Solving: What "Math" Does RL Teach? | May 28, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark | May 28, 2025 | Math | CodeCode Available | 0 |
| Maximizing Confidence Alone Improves Reasoning | May 28, 2025 | GSM8KMath | —Unverified | 0 |
| Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning | May 27, 2025 | Math | —Unverified | 0 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 |
| Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions | May 26, 2025 | AttributeMath | —Unverified | 0 |
| Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles | May 26, 2025 | ARCLogical Reasoning | —Unverified | 0 |
| Inference-time Alignment in Continuous Space | May 26, 2025 | Math | CodeCode Available | 0 |
| Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition | May 26, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal Models | May 26, 2025 | Contrastive LearningMath | CodeCode Available | 0 |
| Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning | May 26, 2025 | DiversityMath | —Unverified | 0 |
| Improving Multilingual Math Reasoning for African Languages | May 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| The Role of Diversity in In-Context Learning for Large Language Models | May 26, 2025 | DiversityIn-Context Learning | —Unverified | 0 |
| Interleaved Reasoning for Large Language Models via Reinforcement Learning | May 26, 2025 | Logical ReasoningMath | —Unverified | 0 |
| Faster and Better LLMs via Latency-Aware Test-Time Scaling | May 26, 2025 | Math | —Unverified | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions | May 24, 2025 | Automated Theorem ProvingMath | CodeCode Available | 0 |
| Steering LLM Reasoning Through Bias-Only Adaptation | May 24, 2025 | GSM8KMath | —Unverified | 0 |
| Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment? | May 24, 2025 | Code GenerationMath | —Unverified | 0 |