| EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning | May 22, 2025 | GSM8KMath | CodeCode Available | 0 |
| Incremental Sequence Classification with Temporal Consistency | May 22, 2025 | ClassificationLanguage Modeling | —Unverified | 0 |
| Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning | May 22, 2025 | AttributeMath | —Unverified | 0 |
| X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs | May 22, 2025 | ChatbotMath | CodeCode Available | 0 |
| Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs | May 22, 2025 | DiagnosticMachine Unlearning | CodeCode Available | 1 |
| Training Step-Level Reasoning Verifiers with Formal Verification Tools | May 21, 2025 | Formal LogicMath | CodeCode Available | 1 |
| Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems | May 21, 2025 | BenchmarkingMath | —Unverified | 0 |
| How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study | May 21, 2025 | Math | CodeCode Available | 0 |
| MAPS: A Multilingual Benchmark for Global Agent Performance and Security | May 21, 2025 | Code GenerationMath | —Unverified | 0 |
| SSR: Speculative Parallel Scaling Reasoning in Test-time | May 21, 2025 | DiversityMath | —Unverified | 0 |
| MIRB: Mathematical Information Retrieval Benchmark | May 21, 2025 | Automated Theorem ProvingInformation Retrieval | CodeCode Available | 0 |
| The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning | May 21, 2025 | Math | CodeCode Available | 1 |
| Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision | May 21, 2025 | GSM8KLearning-To-Rank | —Unverified | 0 |
| ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges | May 21, 2025 | Mathvalid | CodeCode Available | 1 |
| RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning | May 21, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning | May 21, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Meta-Design Matters: A Self-Design Multi-Agent System | May 21, 2025 | MathProblem Decomposition | CodeCode Available | 2 |
| Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities | May 21, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| EasyMath: A 0-shot Math Benchmark for SLMs | May 20, 2025 | Math | —Unverified | 0 |
| RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning | May 20, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning | May 20, 2025 | MathOffline RL | —Unverified | 0 |
| Let's Verify Math Questions Step by Step | May 20, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| The Hallucination Tax of Reinforcement Finetuning | May 20, 2025 | HallucinationMath | —Unverified | 0 |
| General-Reasoner: Advancing LLM Reasoning Across All Domains | May 20, 2025 | AllMath | CodeCode Available | 3 |
| TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning | May 20, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 1 |