| MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations | Feb 10, 2025 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Self-Training Large Language Models for Tool-Use Without Demonstrations | Feb 9, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Feb 6, 2025 | Mathematical ReasoningQuantization | CodeCode Available | 0 |
| LLMs can be easily Confused by Instructional Distractions | Feb 5, 2025 | Bias DetectionCode Generation | —Unverified | 0 |
| Path Planning for Masked Diffusion Model Sampling | Feb 5, 2025 | Code GenerationIn-Context Learning | —Unverified | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning | Feb 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs | Feb 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Feb 4, 2025 | Mathematical Reasoning | —Unverified | 0 |