| Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models | Mar 3, 2025 | Math | —Unverified | 0 |
| MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts | Feb 28, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training | Feb 28, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 |
| Self-Training Elicits Concise Reasoning in Large Language Models | Feb 27, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning | Feb 27, 2025 | MathMedical Question Answering | —Unverified | 0 |
| Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? | Feb 26, 2025 | Math | CodeCode Available | 1 |
| Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation | Feb 26, 2025 | Code GenerationHumanEval | CodeCode Available | 2 |
| Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | Feb 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Feb 25, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| From Euler to AI: Unifying Formulas for Mathematical Constants | Feb 24, 2025 | Math | CodeCode Available | 0 |
| Learning Decentralized Swarms Using Rotation Equivariant Graph Neural Networks | Feb 24, 2025 | Graph Neural NetworkMath | CodeCode Available | 0 |
| Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | Feb 24, 2025 | GSM8KMath | CodeCode Available | 2 |
| Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning | Feb 24, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Reasoning with Latent Thoughts: On the Power of Looped Transformers | Feb 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling | Feb 23, 2025 | Computational EfficiencyMath | —Unverified | 0 |
| SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance | Feb 23, 2025 | Math | —Unverified | 0 |
| Inference Computation Scaling for Feature Augmentation in Recommendation Systems | Feb 22, 2025 | MathRecommendation Systems | —Unverified | 0 |
| Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning | Feb 21, 2025 | Math | —Unverified | 0 |
| The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer | Feb 21, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Forgotten Polygons: Multimodal Large Language Models are Shape-Blind | Feb 21, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| How to Get Your LLM to Generate Challenging Problems for Evaluation | Feb 20, 2025 | Code CompletionMath | CodeCode Available | 1 |
| S*: Test Time Scaling for Code Generation | Feb 20, 2025 | Code GenerationMath | CodeCode Available | 7 |
| GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks | Feb 20, 2025 | Code GenerationMath | CodeCode Available | 0 |
| Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | Feb 20, 2025 | Mathreinforcement-learning | CodeCode Available | 7 |