| CER: Confidence Enhanced Reasoning in LLMs | Feb 20, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective | Feb 20, 2025 | GSM8KMath | CodeCode Available | 0 |
| A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics | Feb 20, 2025 | Math | —Unverified | 0 |
| SIFT: Grounding LLM Reasoning in Contexts via Stickers | Feb 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| BeamLoRA: Beam-Constraint Low-Rank Adaptation | Feb 19, 2025 | Code GenerationMath | —Unverified | 0 |
| DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation | Feb 19, 2025 | DiversityExtreme Summarization | —Unverified | 0 |
| The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? | Feb 19, 2025 | Math | —Unverified | 0 |
| TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation | Feb 19, 2025 | Dataset GenerationGSM8K | CodeCode Available | 0 |
| Reasoning with Reinforced Functional Token Tuning | Feb 19, 2025 | Math | CodeCode Available | 1 |
| Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization | Feb 18, 2025 | Math | —Unverified | 0 |
| Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees | Feb 18, 2025 | Math | —Unverified | 0 |
| None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks | Feb 18, 2025 | MathMemorization | —Unverified | 0 |
| S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning | Feb 18, 2025 | Math | CodeCode Available | 2 |
| NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions | Feb 18, 2025 | Knowledge DistillationMath | —Unverified | 0 |
| Thinking Outside the (Gray) Box: A Context-Based Score for Assessing Value and Originality in Neural Text Generation | Feb 18, 2025 | DiversityMath | —Unverified | 0 |
| Thinking Preference Optimization | Feb 17, 2025 | Math | CodeCode Available | 1 |
| MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task | Feb 17, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| Scaling Test-Time Compute Without Verification or RL is Suboptimal | Feb 17, 2025 | MathReinforcement Learning (RL) | —Unverified | 0 |
| Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving | Feb 17, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption | Feb 17, 2025 | BenchmarkingCode Summarization | —Unverified | 0 |
| Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding | Feb 17, 2025 | Arithmetic ReasoningChart Understanding | —Unverified | 0 |
| A Study on Leveraging Search and Self-Feedback for Agent Reasoning | Feb 17, 2025 | Math | —Unverified | 0 |
| Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge Distillation | Feb 17, 2025 | Knowledge DistillationMath | CodeCode Available | 0 |
| Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models | Feb 17, 2025 | Math | —Unverified | 0 |
| Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL | Feb 17, 2025 | Code GenerationMath | CodeCode Available | 1 |