ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering May 29, 2025 Large Language Model Prompt Engineering
Code Code Available 2Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO May 28, 2025 Math Reinforcement Learning (RL)
Code Code Available 2cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning May 28, 2025 CAD Reconstruction Large Language Model
Code Code Available 2SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution May 27, 2025 Reinforcement Learning (RL)
Code Code Available 2Reinforcing General Reasoning without Verifiers May 27, 2025 Math Mathematical Reasoning
Code Code Available 2Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration May 26, 2025 Domain Generalization Hallucination
Code Code Available 2SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond May 26, 2025 Logical Reasoning Reinforcement Learning (RL)
Code Code Available 2DoctorAgent-RL: A Multi-Agent Collaborative Reinforcement Learning System for Multi-Turn Clinical Dialogue May 26, 2025 Diagnostic Question Answering
Code Code Available 2MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability May 26, 2025 Multi-hop Question Answering Question Answering
Code Code Available 2SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development May 22, 2025 Bug fixing Chatbot
Code Code Available 2SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward May 22, 2025 Reinforcement Learning (RL)
Code Code Available 2ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay May 22, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 2Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models May 22, 2025 Reinforcement Learning (RL)
Code Code Available 2WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning May 22, 2025 Math Reinforcement Learning (RL)
Code Code Available 2Learn to Reason Efficiently with Adaptive Length-based Reward Shaping May 21, 2025 Reinforcement Learning (RL)
Code Code Available 2RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning May 21, 2025 Math Mathematical Reasoning
Code Code Available 2Optimizing Anytime Reasoning via Budget Relative Policy Optimization May 19, 2025 Mathematical Reasoning Reinforcement Learning (RL)
Code Code Available 2G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning May 19, 2025 Language Modeling Language Modelling
Code Code Available 2Synthetic Data RL: Task Definition Is All You Need May 18, 2025 All GSM8K
Code Code Available 2VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning May 18, 2025 Reinforcement Learning (RL)
Code Code Available 2DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy May 16, 2025 Reinforcement Learning (RL)
Code Code Available 2Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models May 15, 2025 Math reinforcement-learning
Code Code Available 2DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation May 12, 2025 Language Modeling Language Modelling
Code Code Available 2Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent May 12, 2025 RAG Reinforcement Learning (RL)
Code Code Available 2Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving May 12, 2025 Math Mathematical Problem-Solving
Code Code Available 2RM-R1: Reward Modeling as Reasoning May 5, 2025 Math Reinforcement Learning (RL)
Code Code Available 2Rulebook: bringing co-routines to reinforcement learning environments Apr 28, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 2CaRL: Learning Scalable Planning Policies with Simple Rewards Apr 24, 2025 Autonomous Driving CARLA longest6
Code Code Available 2FlowReasoner: Reinforcing Query-Level Meta-Agents Apr 21, 2025 Reinforcement Learning (RL)
Code Code Available 2Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning Apr 21, 2025 All Form
Code Code Available 2Generative Auto-Bidding with Value-Guided Explorations Apr 20, 2025 Reinforcement Learning (RL)
Code Code Available 2Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning Apr 17, 2025 Multimodal Reasoning Reinforcement Learning (RL)
Code Code Available 2NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Apr 17, 2025 Data Augmentation Diversity
Code Code Available 2MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning Apr 14, 2025 Machine Translation Reinforcement Learning (RL)
Code Code Available 2SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models Apr 10, 2025 Reinforcement Learning (RL) Visual Reasoning
Code Code Available 2Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization Apr 8, 2025 Math Mathematical Reasoning
Code Code Available 2Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Apr 3, 2025 Reinforcement Learning (RL) Visual Reasoning
Code Code Available 2GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning Apr 3, 2025 Reinforcement Learning (RL)
Code Code Available 2Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Mar 31, 2025 Logical Reasoning Multiple-choice
Code Code Available 2UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning Mar 27, 2025 Model Optimization Reinforcement Learning (RL)
Code Code Available 2Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging Mar 26, 2025 Prompt Engineering Reinforcement Learning (RL)
Code Code Available 2Surrogate Learning in Meta-Black-Box Optimization: A Preliminary Study Mar 23, 2025 Kolmogorov-Arnold Networks Reinforcement Learning (RL)
Code Code Available 2OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Mar 21, 2025 Multimodal Reasoning Reinforcement Learning (RL)
Code Code Available 2Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning Mar 20, 2025 Classification Few-Shot Learning
Code Code Available 2Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models Mar 18, 2025 Anatomy Attribute
Code Code Available 2Reinforcement learning-based motion imitation for physiologically plausible musculoskeletal motor control Mar 18, 2025 Humanoid Control Motion Synthesis
Code Code Available 2Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards Mar 14, 2025 Denoising Image Generation
Code Code Available 2V-Max: A Reinforcement Learning Framework for Autonomous Driving Mar 11, 2025 Autonomous Driving Decision Making
Code Code Available 2Agent models: Internalizing Chain-of-Action Generation into Reasoning models Mar 9, 2025 Action Generation Reinforcement Learning (RL)
Code Code Available 2Multi-Stage Manipulation with Demonstration-Augmented Reward, Policy, and World Model Learning Mar 3, 2025 Reinforcement Learning (RL)
Code Code Available 2