| The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding? | Feb 19, 2025 | Math | —Unverified | 0 |
| TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination Evaluation | Feb 19, 2025 | Dataset GenerationGSM8K | CodeCode Available | 0 |
| BeamLoRA: Beam-Constraint Low-Rank Adaptation | Feb 19, 2025 | Code GenerationMath | —Unverified | 0 |
| DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation | Feb 19, 2025 | DiversityExtreme Summarization | —Unverified | 0 |
| None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks | Feb 18, 2025 | MathMemorization | —Unverified | 0 |
| Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization | Feb 18, 2025 | Math | —Unverified | 0 |
| NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions | Feb 18, 2025 | Knowledge DistillationMath | —Unverified | 0 |
| Thinking Outside the (Gray) Box: A Context-Based Score for Assessing Value and Originality in Neural Text Generation | Feb 18, 2025 | DiversityMath | —Unverified | 0 |
| Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees | Feb 18, 2025 | Math | —Unverified | 0 |
| Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption | Feb 17, 2025 | BenchmarkingCode Summarization | —Unverified | 0 |