| Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings | May 19, 2025 | HumanEvalMath | CodeCode Available | 0 |
| Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space | May 19, 2025 | GSM8KMath | CodeCode Available | 2 |
| AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database | May 19, 2025 | Data AugmentationIn-Context Learning | —Unverified | 0 |
| AdaptThink: Reasoning Models Can Learn When to Think | May 19, 2025 | Math | CodeCode Available | 2 |
| Thinkless: LLM Learns When to Think | May 19, 2025 | GSM8KMath | CodeCode Available | 3 |
| MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision | May 19, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| Synthetic Data RL: Task Definition Is All You Need | May 18, 2025 | AllGSM8K | CodeCode Available | 2 |
| Efficient RL Training for Reasoning Models via Length-Aware Optimization | May 18, 2025 | Math | CodeCode Available | 1 |
| SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization | May 18, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades | May 17, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities | May 17, 2025 | Math | —Unverified | 0 |
| HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems | May 17, 2025 | Arithmetic ReasoningCode Generation | CodeCode Available | 1 |
| MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports | May 16, 2025 | DiagnosticMath | CodeCode Available | 1 |
| Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation | May 16, 2025 | MathMMLU | —Unverified | 0 |
| SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning | May 16, 2025 | Math | —Unverified | 0 |
| HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization | May 16, 2025 | Math | CodeCode Available | 0 |
| MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | May 15, 2025 | cross-modal alignmentGeometry Problem Solving | CodeCode Available | 3 |
| Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models | May 15, 2025 | Mathreinforcement-learning | CodeCode Available | 2 |
| Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models | May 15, 2025 | Large Language ModelMath | CodeCode Available | 0 |
| Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models | May 15, 2025 | Code GenerationGSM8K | —Unverified | 0 |
| DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs | May 15, 2025 | BenchmarkingFairness | —Unverified | 0 |
| PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | May 14, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Measurement to Meaning: A Validity-Centered Framework for AI Evaluation | May 13, 2025 | Math | —Unverified | 0 |