Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents May 18, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning May 18, 2025 Reinforcement Learning (RL)
— Unverified 0AbFlowNet: Optimizing Antibody-Antigen Binding Energy via Diffusion-GFlowNet Fusion May 18, 2025 Reinforcement Learning (RL)
— Unverified 0Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning May 18, 2025 Reinforcement Learning (RL)
Code Code Available 0CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models May 18, 2025 Reinforcement Learning (RL)
Code Code Available 4UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning May 18, 2025 2k Reinforcement Learning (RL)
— Unverified 0VideoRFT: Incentivizing Video Reasoning Capability in MLLMs via Reinforced Fine-Tuning May 18, 2025 Reinforcement Learning (RL)
Code Code Available 2Enhancing Visual Grounding for GUI Agents via Self-Evolutionary Reinforcement Learning May 18, 2025 Reinforcement Learning (RL) Visual Grounding
Code Code Available 3Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward May 18, 2025 GPU Graph Matching
Code Code Available 3AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning May 17, 2025 Reinforcement Learning (RL)
— Unverified 0Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning May 17, 2025 Reinforcement Learning (RL)
— Unverified 0J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge May 17, 2025 Reinforcement Learning (RL)
— Unverified 0Retrospex: Language Agent Meets Offline Reinforcement Learning Critic May 17, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling May 17, 2025 Decision Making reinforcement-learning
— Unverified 0Online Iterative Self-Alignment for Radiology Report Generation May 17, 2025 Reinforcement Learning (RL)
— Unverified 0Reinforcement Learning Finetunes Small Subnetworks in Large Language Models May 16, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Time-R1: Towards Comprehensive Temporal Reasoning in LLMs May 16, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 0Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO May 16, 2025 All Diversity
— Unverified 0Unveiling the Black Box: A Multi-Layer Framework for Explaining Reinforcement Learning-Based Cyber Agents May 16, 2025 CyberBattleSim Reinforcement Learning (RL)
— Unverified 0An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents May 16, 2025 Form Language Modeling
Code Code Available 0Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs May 16, 2025 Mathematical Problem-Solving Reinforcement Learning (RL)
— Unverified 0Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation May 16, 2025 Decision Making Language Modeling
Code Code Available 1Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes May 16, 2025 Reinforcement Learning (RL)
— Unverified 0ShiQ: Bringing back Bellman to LLMs May 16, 2025 Q-Learning Reinforcement Learning (RL)
— Unverified 0DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy May 16, 2025 Reinforcement Learning (RL)
Code Code Available 2Group-in-Group Policy Optimization for LLM Agent Training May 16, 2025 GPU Mathematical Reasoning
Code Code Available 5Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions May 16, 2025 Reinforcement Learning (RL)
— Unverified 0Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM May 16, 2025 Language Modeling Language Modelling
Code Code Available 0Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL May 16, 2025 Reinforcement Learning (RL)
Code Code Available 0Developing and Integrating Trust Modeling into Multi-Objective Reinforcement Learning for Intelligent Agricultural Management May 16, 2025 Management Multi-Objective Reinforcement Learning
— Unverified 0Attention-Based Reward Shaping for Sparse and Delayed Rewards May 16, 2025 Reinforcement Learning (RL)
Code Code Available 0Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design May 16, 2025 Reinforcement Learning (RL)
— Unverified 0ImagineBench: Evaluating Reinforcement Learning with Large Language Model Rollouts May 15, 2025 Continual Learning Language Modeling
Code Code Available 1Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps May 15, 2025 Autonomous Driving Denoising
— Unverified 0Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models May 15, 2025 Code Generation GSM8K
— Unverified 0Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation May 15, 2025 Reinforcement Learning (RL) Transfer Learning
— Unverified 0Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change May 15, 2025 Decision Making Deep Reinforcement Learning
— Unverified 0IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning May 15, 2025 Efficient Exploration Imitation Learning
Code Code Available 0Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models May 15, 2025 Math reinforcement-learning
Code Code Available 2Risk-Aware Safe Reinforcement Learning for Control of Stochastic Linear Systems May 14, 2025 Reinforcement Learning (RL) Safe Reinforcement Learning
— Unverified 0Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data May 14, 2025 Offline RL reinforcement-learning
— Unverified 0TensorRL-QAS: Reinforcement learning with tensor networks for scalable quantum architecture search May 14, 2025 Reinforcement Learning (RL) Tensor Networks
— Unverified 0CEC-Zero: Chinese Error Correction Solution Based on LLM May 14, 2025 Domain Generalization Reinforcement Learning (RL)
— Unverified 0Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning May 13, 2025 Deep Reinforcement Learning Intrusion Detection
— Unverified 0Generalization in Monitored Markov Decision Processes (Mon-MDPs) May 13, 2025 Reinforcement Learning (RL)
— Unverified 0Adaptive Diffusion Policy Optimization for Robotic Manipulation May 13, 2025 continuous-control Continuous Control
Code Code Available 0DSADF: Thinking Fast and Slow for Decision Making May 13, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Automatic Curriculum Learning for Driving Scenarios: Towards Robust and Efficient Reinforcement Learning May 13, 2025 Autonomous Driving Reinforcement Learning (RL)
— Unverified 0Preference Optimization for Combinatorial Optimization Problems May 13, 2025 Combinatorial Optimization Reinforcement Learning (RL)
— Unverified 0Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning May 13, 2025 Meta-Learning Reinforcement Learning (RL)
— Unverified 0