| FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models | May 5, 2025 | BenchmarkingMathematical Reasoning | CodeCode Available | 2 |
| Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning | Apr 14, 2025 | Mathematical Reasoningmbpp | CodeCode Available | 2 |
| LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation | Apr 10, 2025 | Code GenerationContinual Learning | CodeCode Available | 2 |
| Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization | Apr 8, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| Efficient Reinforcement Finetuning via Adaptive Curriculum Learning | Apr 7, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 |
| Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks | Mar 27, 2025 | Imitation LearningMathematical Reasoning | CodeCode Available | 2 |
| RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs | Mar 8, 2025 | Instruction FollowingMathematical Reasoning | CodeCode Available | 2 |
| Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? | Mar 8, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 2 |
| Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | Feb 10, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |