| Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting | Feb 5, 2025 | GSM8KMath | CodeCode Available | 0 |
| Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference | Feb 5, 2025 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| LIMO: Less is More for Reasoning | Feb 5, 2025 | MathMathematical Reasoning | CodeCode Available | 5 |
| Do Large Language Model Benchmarks Test Reliability? | Feb 5, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model | Feb 4, 2025 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs | Feb 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Process Reinforcement through Implicit Rewards | Feb 3, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 5 |
| A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | Feb 3, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Blink of an eye: a simple theory for feature localization in generative models | Feb 2, 2025 | Math | —Unverified | 0 |
| Learning Autonomous Code Integration for Math Language Models | Feb 2, 2025 | Math | —Unverified | 0 |
| Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? | Feb 2, 2025 | MathMMLU | —Unverified | 0 |
| UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models | Feb 1, 2025 | Math | CodeCode Available | 2 |
| Fairshare Data Pricing via Data Valuation for Large Language Models | Jan 31, 2025 | Data ValuationMath | —Unverified | 0 |
| BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning | Jan 31, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| s1: Simple test-time scaling | Jan 31, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| Pheromone-based Learning of Optimal Reasoning Paths | Jan 31, 2025 | ARCGSM8K | —Unverified | 0 |
| Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Boostrapping | Jan 31, 2025 | DenoisingImage Denoising | CodeCode Available | 0 |
| PixelWorld: Towards Perceiving Everything as Pixels | Jan 31, 2025 | Math | —Unverified | 0 |
| Examining the Robustness of Large Language Models across Language Complexity | Jan 30, 2025 | Math | —Unverified | 0 |
| Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis | Jan 30, 2025 | Automated Theorem ProvingMath | CodeCode Available | 1 |
| Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH | Jan 30, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate | Jan 29, 2025 | Instruction FollowingMath | CodeCode Available | 2 |
| Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving | Jan 28, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Jan 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning | Jan 25, 2025 | Math | —Unverified | 0 |