| Large Language Models for Design Structure Matrix Optimization | Jun 11, 2025 | Combinatorial OptimizationMathematical Reasoning | —Unverified | 0 |
| Towards Efficient and Effective Alignment of Large Language Models | Jun 11, 2025 | Mathematical ReasoningMeta-Learning | —Unverified | 0 |
| Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens | Jun 10, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| A Survey on Large Language Models for Mathematical Reasoning | Jun 10, 2025 | Answer GenerationMathematical Reasoning | —Unverified | 0 |
| Can A Gamer Train A Mathematical Reasoning Model? | Jun 10, 2025 | GPUMathematical Reasoning | CodeCode Available | 0 |
| VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism | Jun 10, 2025 | Mathematical ReasoningVisual Reasoning | CodeCode Available | 0 |
| Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic | Jun 9, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Can Theoretical Physics Research Benefit from Language Agents? | Jun 6, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning | Jun 5, 2025 | DiversityMathematical Reasoning | —Unverified | 0 |
| Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning | Jun 5, 2025 | Arithmetic ReasoningMath | CodeCode Available | 0 |
| Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning | Jun 5, 2025 | Mathematical ReasoningProblem Decomposition | —Unverified | 0 |
| Multi-Layer GRPO: Enhancing Reasoning and Self-Correction in Large Language Models | Jun 5, 2025 | Mathematical Reasoning | —Unverified | 0 |
| ProRefine: Inference-time Prompt Refinement with Textual Feedback | Jun 5, 2025 | Mathematical Reasoning | —Unverified | 0 |
| VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos | Jun 5, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning | Jun 5, 2025 | Mathematical Reasoningreinforcement-learning | CodeCode Available | 0 |
| Adaptive Graph Pruning for Multi-Agent Communication | Jun 3, 2025 | Code GenerationLarge Language Model | CodeCode Available | 0 |
| WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks | Jun 2, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| Uni-LoRA: One Vector is All You Need | Jun 1, 2025 | AllMathematical Reasoning | —Unverified | 0 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 |
| Speculative Reward Model Boosts Decision Making Ability of LLMs Cost-Effectively | May 31, 2025 | Decision MakingMathematical Reasoning | CodeCode Available | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| RMoA: Optimizing Mixture-of-Agents through Diversity Maximization and Residual Compensation | May 30, 2025 | Code GenerationDiversity | CodeCode Available | 0 |
| Scaling up the think-aloud method | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt | May 29, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness | May 29, 2025 | DiversityLarge Language Model | —Unverified | 0 |
| Diversity-Aware Policy Optimization for Large Language Model Reasoning | May 29, 2025 | DiversityLanguage Modeling | —Unverified | 0 |
| Discriminative Policy Optimization for Token-Level Reward Models | May 29, 2025 | GSM8KLanguage Modeling | CodeCode Available | 0 |
| AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning | May 29, 2025 | Geometry Problem SolvingMathematical Reasoning | —Unverified | 0 |
| On-Policy RL with Optimal Reward Baseline | May 29, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability | May 29, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Probability-Consistent Preference Optimization for Enhanced LLM Reasoning | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Decomposing Elements of Problem Solving: What "Math" Does RL Teach? | May 28, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Don't Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models | May 27, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision | May 26, 2025 | HallucinationMath | CodeCode Available | 0 |
| Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles | May 26, 2025 | ARCLogical Reasoning | —Unverified | 0 |
| Improving Multilingual Math Reasoning for African Languages | May 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| HS-STAR: Hierarchical Sampling for Self-Taught Reasoners via Difficulty Estimation and Budget Reallocation | May 26, 2025 | Mathematical Reasoning | —Unverified | 0 |
| SituatedThinker: Grounding LLM Reasoning with Real-World through Situated Thinking | May 25, 2025 | Mathematical ReasoningMulti-hop Question Answering | CodeCode Available | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment | May 25, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| MMATH: A Multilingual Benchmark for Mathematical Reasoning | May 25, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions | May 24, 2025 | Automated Theorem ProvingMath | CodeCode Available | 0 |
| Efficient Long CoT Reasoning in Small Language Models | May 24, 2025 | Mathematical Reasoningvalid | —Unverified | 0 |
| LogicCat: A Chain-of-Thought Text-to-SQL Benchmark for Multi-Domain Reasoning Challenges | May 24, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 0 |
| Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation | May 24, 2025 | Mathematical ReasoningMultimodal Reasoning | —Unverified | 0 |
| Unraveling Misinformation Propagation in LLM Reasoning | May 24, 2025 | Mathematical ReasoningMisinformation | CodeCode Available | 0 |
| PPT: A Process-based Preference Learning Framework for Self Improving Table Question Answering Models | May 23, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence | May 23, 2025 | GPULarge Language Model | —Unverified | 0 |
| The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs | May 23, 2025 | Cross-Lingual TransferMath | —Unverified | 0 |
| MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models | May 22, 2025 | Mathematical Reasoning | —Unverified | 0 |