| Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs | May 19, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| Selective Code Generation for Functional Guarantees | May 19, 2025 | Code GenerationHallucination | —Unverified | 0 |
| Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning | May 19, 2025 | 2kMathematical Reasoning | —Unverified | 0 |
| AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database | May 19, 2025 | Data AugmentationIn-Context Learning | —Unverified | 0 |
| Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents | May 19, 2025 | Mathematical Reasoning | —Unverified | 0 |
| MARGE: Improving Math Reasoning for LLMs with Guided Exploration | May 18, 2025 | MathMathematical Reasoning | CodeCode Available | 0 |
| SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization | May 18, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Token-Level Uncertainty Estimation for Large Language Model Reasoning | May 16, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Scaling Reasoning can Improve Factuality in Large Language Models | May 16, 2025 | Knowledge GraphsLarge Language Model | CodeCode Available | 0 |
| Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition | May 16, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations? | May 15, 2025 | Mathematical Reasoning | —Unverified | 0 |
| ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention | May 15, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 0 |
| Agent-as-a-Service based on Agent Network | May 13, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 |
| Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation | May 13, 2025 | Imitation LearningMathematical Reasoning | —Unverified | 0 |
| Assessing Robustness to Spurious Correlations in Post-Training Language Models | May 9, 2025 | Instruction FollowingMathematical Reasoning | —Unverified | 0 |
| Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey | May 6, 2025 | Mathematical Reasoning | —Unverified | 0 |
| RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library | Apr 29, 2025 | Data AugmentationMathematical Reasoning | —Unverified | 0 |
| Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think | Apr 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Apr 28, 2025 | Data AugmentationDiversity | —Unverified | 0 |
| Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning | Apr 28, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models | Apr 28, 2025 | Mathematical ReasoningMeta-Learning | CodeCode Available | 0 |
| SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning | Apr 27, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| Hierarchical Attention Generates Better Proofs | Apr 27, 2025 | Automated Theorem ProvingMathematical Proofs | CodeCode Available | 0 |
| PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts | Apr 25, 2025 | DiversityMathematical Reasoning | —Unverified | 0 |