| Scaling Reasoning can Improve Factuality in Large Language Models | May 16, 2025 | Knowledge GraphsLarge Language Model | CodeCode Available | 0 |
| Group-in-Group Policy Optimization for LLM Agent Training | May 16, 2025 | GPUMathematical Reasoning | CodeCode Available | 5 |
| Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition | May 16, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Reasoning on a Budget: Miniaturizing DeepSeek R1 with SFT-GRPO Alignment for Instruction-Tuned LLMs | May 16, 2025 | Deep Reinforcement LearningMathematical Reasoning | CodeCode Available | 1 |
| Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations? | May 15, 2025 | Mathematical Reasoning | —Unverified | 0 |
| MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | May 15, 2025 | cross-modal alignmentGeometry Problem Solving | CodeCode Available | 3 |
| ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention | May 15, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 0 |
| DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models | May 14, 2025 | DiversityMathematical Reasoning | CodeCode Available | 1 |
| Qwen3 Technical Report | May 14, 2025 | Code GenerationMathematical Reasoning | CodeCode Available | 13 |
| Agent-as-a-Service based on Agent Network | May 13, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation | May 13, 2025 | Imitation LearningMathematical Reasoning | —Unverified | 0 |
| Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | May 12, 2025 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Assessing Robustness to Spurious Correlations in Post-Training Language Models | May 9, 2025 | Instruction FollowingMathematical Reasoning | —Unverified | 0 |
| Crosslingual Reasoning through Test-Time Scaling | May 8, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey | May 6, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Absolute Zero: Reinforced Self-play Reasoning with Zero Data | May 6, 2025 | Mathematical Reasoning | CodeCode Available | 11 |
| Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL | May 5, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models | May 5, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| Rewriting Pre-Training Data Boosts LLM Performance in Math and Code | May 5, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Apr 30, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 5 |
| RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library | Apr 29, 2025 | Data AugmentationMathematical Reasoning | —Unverified | 0 |
| Reinforcement Learning for Reasoning in Large Language Models with One Training Example | Apr 29, 2025 | Domain GeneralizationMath | CodeCode Available | 3 |
| Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think | Apr 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning | Apr 28, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models | Apr 28, 2025 | Mathematical ReasoningMeta-Learning | CodeCode Available | 0 |
| Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Apr 28, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning | Apr 27, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| Hierarchical Attention Generates Better Proofs | Apr 27, 2025 | Automated Theorem ProvingMathematical Proofs | CodeCode Available | 0 |
| PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts | Apr 25, 2025 | DiversityMathematical Reasoning | —Unverified | 0 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training | Apr 24, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 |
| Parameter-Efficient Checkpoint Merging via Metrics-Weighted Averaging | Apr 23, 2025 | Mathematical Reasoningparameter-efficient fine-tuning | —Unverified | 0 |
| AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset | Apr 23, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| Improving RL Exploration for LLM Reasoning through Retrospective Replay | Apr 19, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration | Apr 17, 2025 | Geometry Problem SolvingLarge Language Model | CodeCode Available | 1 |
| Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? | Apr 16, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| BitNet b1.58 2B4T Technical Report | Apr 16, 2025 | Computational EfficiencyCPU | —Unverified | 0 |
| ReTool: Reinforcement Learning for Strategic Tool Use in LLMs | Apr 15, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| Assessment of Evolving Large Language Models in Upper Secondary Mathematics | Apr 15, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Teaching Large Language Models to Reason through Learning and Forgetting | Apr 15, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| A Dual-Space Framework for General Knowledge Distillation of Large Language Models | Apr 15, 2025 | Code GenerationGeneral Knowledge | CodeCode Available | 1 |
| DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning | Apr 15, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 3 |
| Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning | Apr 14, 2025 | Mathematical Reasoningmbpp | CodeCode Available | 2 |
| Breaking the Data Barrier -- Building GUI Agents Through Task Generalization | Apr 14, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection | Apr 13, 2025 | Answer SelectionAutomated Theorem Proving | —Unverified | 0 |
| GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models | Apr 13, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining | Apr 10, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 1 |
| Supervised Optimism Correction: Be Confident When LLMs Are Sure | Apr 10, 2025 | GSM8KMath | —Unverified | 0 |
| Kimi-VL Technical Report | Apr 10, 2025 | Long-Context UnderstandingMathematical Reasoning | CodeCode Available | 5 |