| Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models | Feb 12, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 |
| Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | Feb 12, 2025 | GSM8KMath | CodeCode Available | 0 |
| One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs | Feb 12, 2025 | Mathematical Reasoning | —Unverified | 0 |
| LLMs can implicitly learn from mistakes in-context | Feb 12, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning | Feb 11, 2025 | Code GenerationMath | CodeCode Available | 0 |
| MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations | Feb 10, 2025 | BenchmarkingIn-Context Learning | —Unverified | 0 |
| Self-Training Large Language Models for Tool-Use Without Demonstrations | Feb 9, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Feb 6, 2025 | Mathematical ReasoningQuantization | CodeCode Available | 0 |
| LLMs can be easily Confused by Instructional Distractions | Feb 5, 2025 | Bias DetectionCode Generation | —Unverified | 0 |
| Path Planning for Masked Diffusion Model Sampling | Feb 5, 2025 | Code GenerationIn-Context Learning | —Unverified | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 |
| Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning | Feb 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs | Feb 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Feb 4, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Policy Guided Tree Search for Enhanced LLM Reasoning | Feb 4, 2025 | Mathematical ReasoningNavigate | —Unverified | 0 |
| MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs | Feb 3, 2025 | Mathematical ReasoningMixture-of-Experts | —Unverified | 0 |
| Language Models Use Trigonometry to Do Addition | Feb 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Bridging the Reasoning Gap: Small LLMs Can Plan with Generalised Strategies | Jan 31, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations | Jan 31, 2025 | Mathematical Reasoning | —Unverified | 0 |
| LemmaHead: RAG Assisted Proof Generation Using Large Language Models | Jan 27, 2025 | Automated Theorem ProvingMathematical Proofs | —Unverified | 0 |
| From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs | Jan 27, 2025 | 4kMathematical Reasoning | —Unverified | 0 |
| Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Jan 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| The Karp Dataset | Jan 24, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Coarse-to-Fine Process Reward Modeling for Enhanced Mathematical Reasoning | Jan 23, 2025 | AttributeMathematical Reasoning | —Unverified | 0 |