| Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Apr 28, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning | Apr 27, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| Hierarchical Attention Generates Better Proofs | Apr 27, 2025 | Automated Theorem ProvingMathematical Proofs | CodeCode Available | 0 |
| PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts | Apr 25, 2025 | DiversityMathematical Reasoning | —Unverified | 0 |
| Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics | Apr 24, 2025 | Code GenerationMath | —Unverified | 0 |
| DeepDistill: Enhancing LLM Reasoning Capabilities via Large-Scale Difficulty-Graded Data Training | Apr 24, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency | Apr 24, 2025 | BenchmarkingMath | CodeCode Available | 1 |
| Parameter-Efficient Checkpoint Merging via Metrics-Weighted Averaging | Apr 23, 2025 | Mathematical Reasoningparameter-efficient fine-tuning | —Unverified | 0 |
| AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset | Apr 23, 2025 | MathMathematical Reasoning | CodeCode Available | 4 |
| Improving RL Exploration for LLM Reasoning through Retrospective Replay | Apr 19, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration | Apr 17, 2025 | Geometry Problem SolvingLarge Language Model | CodeCode Available | 1 |
| Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? | Apr 16, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| BitNet b1.58 2B4T Technical Report | Apr 16, 2025 | Computational EfficiencyCPU | —Unverified | 0 |
| ReTool: Reinforcement Learning for Strategic Tool Use in LLMs | Apr 15, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Assessment of Evolving Large Language Models in Upper Secondary Mathematics | Apr 15, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Teaching Large Language Models to Reason through Learning and Forgetting | Apr 15, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| A Dual-Space Framework for General Knowledge Distillation of Large Language Models | Apr 15, 2025 | Code GenerationGeneral Knowledge | CodeCode Available | 1 |
| DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning | Apr 15, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 3 |
| Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning | Apr 14, 2025 | Mathematical Reasoningmbpp | CodeCode Available | 2 |
| Breaking the Data Barrier -- Building GUI Agents Through Task Generalization | Apr 14, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| Enhancing Mathematical Reasoning in Large Language Models with Self-Consistency-Based Hallucination Detection | Apr 13, 2025 | Answer SelectionAutomated Theorem Proving | —Unverified | 0 |
| GRPO-LEAD: A Difficulty-Aware Reinforcement Learning Approach for Concise Mathematical Reasoning in Language Models | Apr 13, 2025 | Mathematical Reasoning | CodeCode Available | 1 |
| Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining | Apr 10, 2025 | Mathematical ReasoningReinforcement Learning (RL) | CodeCode Available | 1 |
| Supervised Optimism Correction: Be Confident When LLMs Are Sure | Apr 10, 2025 | GSM8KMath | —Unverified | 0 |
| Kimi-VL Technical Report | Apr 10, 2025 | Long-Context UnderstandingMathematical Reasoning | CodeCode Available | 5 |