| Flaming-hot Initiation with Regular Execution Sampling for Large Language Models | Oct 28, 2024 | DiversityMath | CodeCode Available | 2 | 5 |
| MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | Oct 5, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 2 | 5 |
| Archon: An Architecture Search Framework for Inference-Time Techniques | Sep 23, 2024 | Hyperparameter OptimizationInstruction Following | CodeCode Available | 2 | 5 |
| AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions | Jun 10, 2025 | Math | CodeCode Available | 2 | 5 |
| Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO | May 28, 2025 | MathReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| MathPile: A Billion-Token-Scale Pretraining Corpus for Math | Dec 28, 2023 | Language IdentificationMath | CodeCode Available | 2 | 5 |
| Memorizing Transformers | Mar 16, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| On the Emergence of Thinking in LLMs I: Searching for the Right Intuition | Feb 10, 2025 | Math | CodeCode Available | 2 | 5 |
| SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models | Jan 15, 2024 | MathMathematical Reasoning | CodeCode Available | 2 | 5 |
| Expression Syntax Information Bottleneck for Math Word Problems | Oct 24, 2023 | Math | CodeCode Available | 1 | 5 |
| M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models | Apr 14, 2025 | MambaMath | CodeCode Available | 1 | 5 |
| Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Jun 24, 2024 | Instruction FollowingMath | CodeCode Available | 1 | 5 |
| A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | Feb 3, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? | Feb 26, 2025 | Math | CodeCode Available | 1 | 5 |
| Explaining Datasets in Words: Statistical Models with Natural Language Parameters | Sep 13, 2024 | ClusteringLanguage Modeling | CodeCode Available | 1 | 5 |
| Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for Education | Jan 30, 2023 | MathPosition | CodeCode Available | 1 | 5 |
| A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning | Jul 11, 2025 | MathMathematical Reasoning | CodeCode Available | 1 | 5 |
| LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation | Mar 25, 2025 | Code CompletionLanguage Modeling | CodeCode Available | 1 | 5 |
| Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective | Jun 22, 2025 | In-Context LearningLarge Language Model | CodeCode Available | 1 | 5 |
| LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Jul 5, 2025 | BenchmarkingGPU | CodeCode Available | 1 | 5 |
| EXAONE Deep: Reasoning Enhanced Language Models | Mar 16, 2025 | Math | CodeCode Available | 1 | 5 |
| LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks | Oct 16, 2024 | Mathparameter-efficient fine-tuning | CodeCode Available | 1 | 5 |
| MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models | Feb 2, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math Identifiers | Jun 1, 2022 | Math | CodeCode Available | 1 | 5 |
| Broken Neural Scaling Laws | Oct 26, 2022 | Adversarial RobustnessContinual Learning | CodeCode Available | 1 | 5 |