| Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition | May 16, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Scaling Reasoning can Improve Factuality in Large Language Models | May 16, 2025 | Knowledge GraphsLarge Language Model | CodeCode Available | 0 |
| Group-in-Group Policy Optimization for LLM Agent Training | May 16, 2025 | GPUMathematical Reasoning | CodeCode Available | 5 |
| Reasoning on a Budget: Miniaturizing DeepSeek R1 with SFT-GRPO Alignment for Instruction-Tuned LLMs | May 16, 2025 | Deep Reinforcement LearningMathematical Reasoning | CodeCode Available | 1 |
| MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning | May 15, 2025 | cross-modal alignmentGeometry Problem Solving | CodeCode Available | 3 |
| Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations? | May 15, 2025 | Mathematical Reasoning | —Unverified | 0 |
| ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention | May 15, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 0 |
| DRA-GRPO: Exploring Diversity-Aware Reward Adjustment for R1-Zero-Like Training of Large Language Models | May 14, 2025 | DiversityMathematical Reasoning | CodeCode Available | 1 |
| Qwen3 Technical Report | May 14, 2025 | Code GenerationMathematical Reasoning | CodeCode Available | 14 |
| Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation | May 13, 2025 | Imitation LearningMathematical Reasoning | —Unverified | 0 |
| Agent-as-a-Service based on Agent Network | May 13, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving | May 12, 2025 | MathMathematical Problem-Solving | CodeCode Available | 2 |
| Assessing Robustness to Spurious Correlations in Post-Training Language Models | May 9, 2025 | Instruction FollowingMathematical Reasoning | —Unverified | 0 |
| Crosslingual Reasoning through Test-Time Scaling | May 8, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey | May 6, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Absolute Zero: Reinforced Self-play Reasoning with Zero Data | May 6, 2025 | Mathematical Reasoning | CodeCode Available | 11 |
| Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL | May 5, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models | May 5, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| Rewriting Pre-Training Data Boosts LLM Performance in Math and Code | May 5, 2025 | Code GenerationGSM8K | CodeCode Available | 1 |
| DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition | Apr 30, 2025 | Automated Theorem ProvingLarge Language Model | CodeCode Available | 5 |
| RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library | Apr 29, 2025 | Data AugmentationMathematical Reasoning | —Unverified | 0 |
| Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think | Apr 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Reinforcement Learning for Reasoning in Large Language Models with One Training Example | Apr 29, 2025 | Domain GeneralizationMath | CodeCode Available | 3 |
| Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning | Apr 28, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models | Apr 28, 2025 | Mathematical ReasoningMeta-Learning | CodeCode Available | 0 |