Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction Jun 9, 2025 Reinforcement Learning (RL)
Code Code Available 2Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning Jun 8, 2025 Offline RL Question Answering
— Unverified 0Reliable Critics: Monotonic Improvement and Convergence Guarantees for Reinforcement Learning Jun 8, 2025 Reinforcement Learning (RL)
— Unverified 0On the Generalization of Data-Assisted Control in port-Hamiltonian Systems (DAC-pH) Jun 8, 2025 parameter estimation Reinforcement Learning (RL)
— Unverified 0CARoL: Context-aware Adaptation for Robot Learning Jun 8, 2025 Reinforcement Learning (RL)
— Unverified 0Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression Jun 8, 2025 quantile regression Reinforcement Learning (RL)
— Unverified 0QForce-RL: Quantized FPGA-Optimized Reinforcement Learning Compute Engine Jun 8, 2025 Decision Making Quantization
— Unverified 0Prompting Wireless Networks: Reinforced In-Context Learning for Power Control Jun 6, 2025 Decision Making In-Context Learning
— Unverified 0Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning Jun 6, 2025 Reinforcement Learning (RL)
Code Code Available 0Towards Infant Sleep-Optimized Driving: Synergizing Wearable and Vehicle Sensing in Intelligent Cruise Control Jun 6, 2025 Reinforcement Learning (RL) Sleep Quality
— Unverified 0CodeContests+: High-Quality Test Case Generation for Competitive Programming Jun 6, 2025 Reinforcement Learning (RL)
— Unverified 0Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Jun 5, 2025 All Math
— Unverified 0Dissecting Long Reasoning Models: An Empirical Study Jun 5, 2025 Reinforcement Learning (RL)
Code Code Available 0Safe Planning and Policy Optimization via World Model Learning Jun 5, 2025 continuous-control Continuous Control
— Unverified 0Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning Jun 5, 2025 Mathematical Reasoning Problem Decomposition
— Unverified 0Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay Jun 5, 2025 Reinforcement Learning (RL)
Code Code Available 1On the Mechanism of Reasoning Pattern Selection in Reinforcement Learning for Language Models Jun 5, 2025 Instruction Following Reinforcement Learning (RL)
— Unverified 0Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning Jun 5, 2025 Q-Learning Reinforcement Learning (RL)
— Unverified 0Latent Guided Sampling for Combinatorial Optimization Jun 4, 2025 Combinatorial Optimization Drug Discovery
Code Code Available 0Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Jun 4, 2025 Multimodal Reasoning Reinforcement Learning (RL)
— Unverified 0A Lyapunov Drift-Plus-Penalty Method Tailored for Reinforcement Learning with Queue Stability Jun 4, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Learning-at-Criticality in Large Language Models for Quantum Field Theory and Beyond Jun 4, 2025 Arithmetic Reasoning Reinforcement Learning (RL)
— Unverified 0SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL Jun 4, 2025 Disentanglement Industrial Robots
— Unverified 0CORE: Constraint-Aware One-Step Reinforcement Learning for Simulation-Guided Neural Network Accelerator Design Jun 4, 2025 Reinforcement Learning (RL)
— Unverified 0Joint Modeling for Learning Decision-Making Dynamics in Behavioral Experiments Jun 3, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback Jun 3, 2025 Reinforcement Learning (RL)
— Unverified 0Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Jun 3, 2025 GPU Math
— Unverified 0Learned Controllers for Agile Quadrotors in Pursuit-Evasion Games Jun 3, 2025 Continual Learning Reinforcement Learning (RL)
— Unverified 0Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains Jun 2, 2025 Math Reinforcement Learning (RL)
— Unverified 0SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Jun 2, 2025 Multimodal Reasoning reinforcement-learning
— Unverified 0Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models Jun 2, 2025 Instruction Following Reinforcement Learning (RL)
Code Code Available 1KDRL: Post-Training Reasoning LLMs via Unified Knowledge Distillation and Reinforcement Learning Jun 2, 2025 Knowledge Distillation Large Language Model
— Unverified 0Trajectory First: A Curriculum for Discovering Diverse Policies Jun 2, 2025 Diversity Reinforcement Learning (RL)
— Unverified 0Data-assimilated model-informed reinforcement learning Jun 2, 2025 model reinforcement-learning
— Unverified 0Reasoning-Table: Exploring Reinforcement Learning for Table Reasoning Jun 2, 2025 Fact Verification Language Modeling
Code Code Available 2A Reinforcement Learning Approach for RIS-aided Fair Communications Jun 1, 2025 Fairness reinforcement-learning
— Unverified 0DriveMind: A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving Jun 1, 2025 Autonomous Driving Decoder
— Unverified 0ARIA: Training Language Agents with Intention-Driven Reward Aggregation May 31, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning May 31, 2025 Diagnostic Reinforcement Learning (RL)
— Unverified 0Reinforcement Learning for Hanabi May 31, 2025 Card Games Deep Reinforcement Learning
— Unverified 0Balancing Profit and Fairness in Risk-Based Pricing Markets May 30, 2025 Fairness Reinforcement Learning (RL)
— Unverified 0MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models May 30, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Reason-SVG: Hybrid Reward RL for Aha-Moments in Vector Graphics Generation May 30, 2025 Reinforcement Learning (RL) Vector Graphics
— Unverified 0Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning May 30, 2025 Question Answering Reinforcement Learning (RL)
— Unverified 0How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning May 30, 2025 ARC Reinforcement Learning (RL)
— Unverified 0Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models May 30, 2025 Math Multiple-choice
Code Code Available 0AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning May 30, 2025 GPU Math
Code Code Available 7ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL May 30, 2025 Image Generation Language Modeling
Code Code Available 2ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models May 30, 2025 Reinforcement Learning (RL)
Code Code Available 5Towards Effective Code-Integrated Reasoning May 30, 2025 Mathematical Reasoning Reinforcement Learning (RL)
Code Code Available 1