| From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics | Mar 10, 2025 | MathQuestion Answering | —Unverified | 0 |
| Decoding the Black Box: Integrating Moral Imagination with Technical AI Governance | Mar 9, 2025 | EthicsMath | —Unverified | 0 |
| InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models | Mar 9, 2025 | Computational EfficiencyMath | —Unverified | 0 |
| Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | Mar 7, 2025 | GPUMath | —Unverified | 0 |
| START: Self-taught Reasoner with Tools | Mar 6, 2025 | MathSelf-Learning | —Unverified | 0 |
| Better Process Supervision with Bi-directional Rewarding Signals | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Benchmarking Reasoning Robustness in Large Language Models | Mar 6, 2025 | BenchmarkingMath | —Unverified | 0 |
| SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning | Mar 6, 2025 | GSM8KMath | —Unverified | 0 |
| HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks | Mar 6, 2025 | ChatbotLogical Reasoning | —Unverified | 0 |
| Compositional Causal Reasoning Evaluation in Language Models | Mar 6, 2025 | Math | —Unverified | 0 |
| Performance Comparison of Large Language Models on Advanced Calculus Problems | Mar 5, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach | Mar 5, 2025 | Instruction FollowingMath | —Unverified | 0 |
| FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4 | Mar 5, 2025 | Answer SelectionMath | —Unverified | 0 |
| Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models | Mar 4, 2025 | GSM8KMath | —Unverified | 0 |
| When an LLM is apprehensive about its answers -- and when its uncertainty is justified | Mar 3, 2025 | MathMMLU | CodeCode Available | 0 |
| What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret | Mar 3, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models | Mar 3, 2025 | Math | —Unverified | 0 |
| MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts | Feb 28, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training | Feb 28, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning | Feb 27, 2025 | MathMedical Question Answering | —Unverified | 0 |
| Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | Feb 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | Feb 25, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Reasoning with Latent Thoughts: On the Power of Looped Transformers | Feb 24, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Learning Decentralized Swarms Using Rotation Equivariant Graph Neural Networks | Feb 24, 2025 | Graph Neural NetworkMath | CodeCode Available | 0 |
| Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning | Feb 24, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |