| Large Language Models for Multi-Robot Systems: A Survey | Feb 6, 2025 | Action GenerationBenchmarking | CodeCode Available | 1 |
| KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference | Feb 6, 2025 | Mathematical ReasoningQuantization | CodeCode Available | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 |
| LIMO: Less is More for Reasoning | Feb 5, 2025 | MathMathematical Reasoning | CodeCode Available | 5 |
| LLMs can be easily Confused by Instructional Distractions | Feb 5, 2025 | Bias DetectionCode Generation | —Unverified | 0 |
| Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning | Feb 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Path Planning for Masked Diffusion Model Sampling | Feb 5, 2025 | Code GenerationIn-Context Learning | —Unverified | 0 |
| Policy Guided Tree Search for Enhanced LLM Reasoning | Feb 4, 2025 | Mathematical ReasoningNavigate | —Unverified | 0 |
| Reusing Embeddings: Reproducible Reward Model Research in Large Language Model Alignment without GPUs | Feb 4, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 2 |
| Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Feb 4, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs | Feb 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs | Feb 3, 2025 | Mathematical ReasoningMixture-of-Experts | —Unverified | 0 |
| A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | Feb 3, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Language Models Use Trigonometry to Do Addition | Feb 2, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Improving Rule-based Reasoning in LLMs via Neurosymbolic Representations | Jan 31, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Bridging the Reasoning Gap: Small LLMs Can Plan with Generalised Strategies | Jan 31, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| s1: Simple test-time scaling | Jan 31, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 9 |
| Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate | Jan 29, 2025 | Instruction FollowingMath | CodeCode Available | 2 |
| LemmaHead: RAG Assisted Proof Generation Using Large Language Models | Jan 27, 2025 | Automated Theorem ProvingMathematical Proofs | —Unverified | 0 |
| From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs | Jan 27, 2025 | 4kMathematical Reasoning | —Unverified | 0 |
| Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework | Jan 26, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| The Karp Dataset | Jan 24, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages | Jan 23, 2025 | Instruction FollowingMath | —Unverified | 0 |
| UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models | Jan 23, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Coarse-to-Fine Process Reward Modeling for Enhanced Mathematical Reasoning | Jan 23, 2025 | AttributeMathematical Reasoning | —Unverified | 0 |
| O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning | Jan 22, 2025 | Mathematical Reasoning | CodeCode Available | 2 |
| DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | Jan 22, 2025 | Mathematical ReasoningMulti-task Language Understanding | CodeCode Available | 15 |
| CDW-CoT: Clustered Distance-Weighted Chain-of-Thoughts Reasoning | Jan 21, 2025 | ClusteringMathematical Reasoning | —Unverified | 0 |
| InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model | Jan 21, 2025 | Instruction FollowingMathematical Reasoning | —Unverified | 0 |
| Benchmarking Large Language Models via Random Variables | Jan 20, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective | Jan 19, 2025 | Automated Theorem ProvingMath | —Unverified | 0 |
| Control LLM: Controlled Evolution for Intelligence Retention in LLM | Jan 19, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback | Jan 18, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| The Lessons of Developing Process Reward Models in Mathematical Reasoning | Jan 13, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs | Jan 11, 2025 | MathMathematical Problem-Solving | CodeCode Available | 1 |
| Search-o1: Agentic Search-Enhanced Large Reasoning Models | Jan 9, 2025 | Code Generation | CodeCode Available | 5 |
| VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models | Jan 9, 2025 | BenchmarkingMathematical Problem-Solving | CodeCode Available | 1 |
| URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics | Jan 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning | Jan 6, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap | Jan 5, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Table as Thought: Exploring Structured Thoughts in LLM Reasoning | Jan 4, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search | Jan 2, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Plug-and-Play Training Framework for Preference Optimization | Dec 30, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 |
| LLM2: Let Large Language Models Harness System 2 Reasoning | Dec 29, 2024 | GSM8KMathematical Reasoning | CodeCode Available | 0 |
| Large Language Models for Mathematical Analysis | Dec 28, 2024 | Mathematical Problem-SolvingMathematical Reasoning | CodeCode Available | 0 |
| LLM Reasoning Engine: Specialized Training for Enhanced Mathematical Reasoning | Dec 28, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Multilingual Mathematical Reasoning: Advancing Open-Source LLMs in Hindi and English | Dec 24, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners | Dec 23, 2024 | Mathematical Reasoning | CodeCode Available | 2 |
| Multi-Agent Sampling: Scaling Inference Compute for Data Synthesis with Tree Search-Based Agentic Collaboration | Dec 22, 2024 | Decision MakingMachine Translation | CodeCode Available | 0 |
| System-2 Mathematical Reasoning via Enriched Instruction Tuning | Dec 22, 2024 | ERPGSM8K | —Unverified | 0 |