| Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence | May 23, 2025 | GPULarge Language Model | —Unverified | 0 |
| RaDeR: Reasoning-aware Dense Retrieval Models | May 23, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs | May 23, 2025 | Cross-Lingual TransferMath | —Unverified | 0 |
| Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN | May 22, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms | May 22, 2025 | Adversarial AttackBenchmarking | —Unverified | 0 |
| Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains | May 22, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning | May 22, 2025 | Mathematical Reasoningreinforcement-learning | CodeCode Available | 1 |
| Dynamic Sampling that Adapts: Iterative DPO for Self-Aware Mathematical Reasoning | May 22, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving | May 22, 2025 | DiagnosticMathematical Problem-Solving | —Unverified | 0 |
| HOFT: Householder Orthogonal Fine-tuning | May 22, 2025 | Machine TranslationMathematical Reasoning | —Unverified | 0 |
| EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | May 22, 2025 | GSM8KMath | CodeCode Available | 0 |
| MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models | May 22, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Bottlenecked Transformers: Periodic KV Cache Abstraction for Generalised Reasoning | May 22, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems | May 21, 2025 | BenchmarkingMath | —Unverified | 0 |
| MAPS: A Multilingual Benchmark for Global Agent Performance and Security | May 21, 2025 | Code GenerationMath | —Unverified | 0 |
| SSR: Speculative Parallel Scaling Reasoning in Test-time | May 21, 2025 | DiversityMath | —Unverified | 0 |
| Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning | May 21, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning | May 21, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning | May 21, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision | May 21, 2025 | GSM8KLearning-To-Rank | —Unverified | 0 |
| MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem | May 20, 2025 | Mathematical Reasoningscientific discovery | CodeCode Available | 3 |
| SCOPE: Compress Mathematical Reasoning Steps for Efficient Automated Process Annotation | May 20, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Beyond the First Error: Process Reward Models for Reflective Mathematical Reasoning | May 20, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Text Generation Beyond Discrete Token Sampling | May 20, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models | May 20, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 1 |
| Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning | May 20, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models | May 20, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum | May 20, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| Let's Verify Math Questions Step by Step | May 20, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning | May 20, 2025 | HallucinationMathematical Reasoning | CodeCode Available | 5 |
| Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning | May 20, 2025 | Logical ReasoningMathematical Reasoning | —Unverified | 0 |
| WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications | May 20, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 |
| General-Reasoner: Advancing LLM Reasoning Across All Domains | May 20, 2025 | AllMath | CodeCode Available | 3 |
| OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation | May 20, 2025 | Common Sense ReasoningMathematical Reasoning | —Unverified | 0 |
| Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers | May 19, 2025 | In-Context LearningInstruction Following | —Unverified | 0 |
| Selective Code Generation for Functional Guarantees | May 19, 2025 | Code GenerationHallucination | —Unverified | 0 |
| Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents | May 19, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning | May 19, 2025 | 2kMathematical Reasoning | —Unverified | 0 |
| MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO | May 19, 2025 | DecoderImage Generation | CodeCode Available | 0 |
| Optimizing Anytime Reasoning via Budget Relative Policy Optimization | May 19, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs | May 19, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards | May 19, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision | May 19, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database | May 19, 2025 | Data AugmentationIn-Context Learning | —Unverified | 0 |
| RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics | May 18, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization | May 18, 2025 | Mathematical Reasoning | CodeCode Available | 2 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization | May 18, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Token-Level Uncertainty Estimation for Large Language Model Reasoning | May 16, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |