| From Euler to AI: Unifying Formulas for Mathematical Constants | Feb 24, 2025 | Math | CodeCode Available | 0 |
| SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance | Feb 23, 2025 | Math | —Unverified | 0 |
| DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling | Feb 23, 2025 | Computational EfficiencyMath | —Unverified | 0 |
| Inference Computation Scaling for Feature Augmentation in Recommendation Systems | Feb 22, 2025 | MathRecommendation Systems | —Unverified | 0 |
| Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning | Feb 21, 2025 | Math | —Unverified | 0 |
| The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer | Feb 21, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 |
| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| GATE: Graph-based Adaptive Tool Evolution Across Diverse Tasks | Feb 20, 2025 | Code GenerationMath | CodeCode Available | 0 |
| A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics | Feb 20, 2025 | Math | —Unverified | 0 |
| BeamLoRA: Beam-Constraint Low-Rank Adaptation | Feb 19, 2025 | Code GenerationMath | —Unverified | 0 |
| DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation | Feb 19, 2025 | DiversityExtreme Summarization | —Unverified | 0 |
| The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? | Feb 19, 2025 | Math | —Unverified | 0 |
| TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation | Feb 19, 2025 | Dataset GenerationGSM8K | CodeCode Available | 0 |
| None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks | Feb 18, 2025 | MathMemorization | —Unverified | 0 |
| NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions | Feb 18, 2025 | Knowledge DistillationMath | —Unverified | 0 |
| Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees | Feb 18, 2025 | Math | —Unverified | 0 |
| Thinking Outside the (Gray) Box: A Context-Based Score for Assessing Value and Originality in Neural Text Generation | Feb 18, 2025 | DiversityMath | —Unverified | 0 |
| Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization | Feb 18, 2025 | Math | —Unverified | 0 |
| Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding | Feb 17, 2025 | Arithmetic ReasoningChart Understanding | —Unverified | 0 |
| Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption | Feb 17, 2025 | BenchmarkingCode Summarization | —Unverified | 0 |
| A Study on Leveraging Search and Self-Feedback for Agent Reasoning | Feb 17, 2025 | Math | —Unverified | 0 |
| MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task | Feb 17, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models | Feb 17, 2025 | Math | —Unverified | 0 |
| Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation | Feb 17, 2025 | Knowledge DistillationMath | CodeCode Available | 0 |
| Scaling Test-Time Compute Without Verification or RL is Suboptimal | Feb 17, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving | Feb 17, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls | Feb 16, 2025 | Computational EfficiencyGSM8K | CodeCode Available | 0 |
| Graders should cheat: privileged information enables expert-level automated evaluations | Feb 16, 2025 | Math | —Unverified | 0 |
| 1bit-Merging: Dynamic Quantized Merging for Large Language Models | Feb 15, 2025 | Code GenerationMath | —Unverified | 0 |
| CRANE: Reasoning with constrained LLM generation | Feb 13, 2025 | Code GenerationMath | —Unverified | 0 |
| MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Feb 13, 2025 | BenchmarkingMath | —Unverified | 0 |
| Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving | Feb 12, 2025 | Mathmultimodal interaction | —Unverified | 0 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| O1 Embedder: Let Retrievers Think Before Action | Feb 11, 2025 | Contrastive LearningMath | —Unverified | 0 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 |
| MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations | Feb 10, 2025 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation | Feb 6, 2025 | In-Context LearningKnowledge Distillation | —Unverified | 0 |
| Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference | Feb 5, 2025 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting | Feb 5, 2025 | GSM8KMath | CodeCode Available | 0 |
| Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 | Feb 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs | Feb 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model | Feb 4, 2025 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Learning Autonomous Code Integration for Math Language Models | Feb 2, 2025 | Math | —Unverified | 0 |
| Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? | Feb 2, 2025 | MathMMLU | —Unverified | 0 |
| Blink of an eye: a simple theory for feature localization in generative models | Feb 2, 2025 | Math | —Unverified | 0 |
| BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning | Jan 31, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Boostrapping | Jan 31, 2025 | DenoisingImage Denoising | CodeCode Available | 0 |