Counterfactual Explanations for Continuous Action Reinforcement Learning May 19, 2025 counterfactual reinforcement-learning
Code Code Available 0Exploiting Symbolic Heuristics for the Synthesis of Domain-Specific Temporal Planning Guidance using Reinforcement Learning May 19, 2025 Reinforcement Learning (RL)
— Unverified 0Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization May 19, 2025 Offline RL Portfolio Optimization
— Unverified 0Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis May 19, 2025 All Multi-Armed Bandits
— Unverified 0DGRO: Enhancing LLM Reasoning via Exploration-Exploitation Control and Reward Variance Management May 19, 2025 Management Reinforcement Learning (RL)
— Unverified 0On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding May 19, 2025 Code Generation Code Translation
— Unverified 0Power Allocation for Delay Optimization in Device-to-Device Networks: A Graph Reinforcement Learning Approach May 19, 2025 Fairness Reinforcement Learning (RL)
— Unverified 0ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving May 19, 2025 Reinforcement Learning (RL)
— Unverified 0Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning May 19, 2025 D4RL model
— Unverified 0Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning May 19, 2025 D4RL Model-based Reinforcement Learning
— Unverified 0Benchmarking MOEAs for solving continuous multi-objective RL problems May 19, 2025 Benchmarking Evolutionary Algorithms
Code Code Available 0AbFlowNet: Optimizing Antibody-Antigen Binding Energy via Diffusion-GFlowNet Fusion May 18, 2025 Reinforcement Learning (RL)
— Unverified 0Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents May 18, 2025 Decision Making Reinforcement Learning (RL)
— Unverified 0Resolving Latency and Inventory Risk in Market Making with Reinforcement Learning May 18, 2025 Reinforcement Learning (RL)
— Unverified 0UIShift: Enhancing VLM-based GUI Agents through Self-supervised Reinforcement Learning May 18, 2025 2k Reinforcement Learning (RL)
— Unverified 0A Finite-Sample Analysis of Distributionally Robust Average-Reward Reinforcement Learning May 18, 2025 Reinforcement Learning (RL)
— Unverified 0Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios May 18, 2025 Autonomous Driving Reinforcement Learning (RL)
— Unverified 0Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning May 18, 2025 Reinforcement Learning (RL)
Code Code Available 0Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning May 17, 2025 Reinforcement Learning (RL)
— Unverified 0J1: Exploring Simple Test-Time Scaling for LLM-as-a-Judge May 17, 2025 Reinforcement Learning (RL)
— Unverified 0Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling May 17, 2025 Decision Making reinforcement-learning
— Unverified 0Retrospex: Language Agent Meets Offline Reinforcement Learning Critic May 17, 2025 reinforcement-learning Reinforcement Learning
Code Code Available 0AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning May 17, 2025 Reinforcement Learning (RL)
— Unverified 0Online Iterative Self-Alignment for Radiology Report Generation May 17, 2025 Reinforcement Learning (RL)
— Unverified 0An agentic system with reinforcement-learned subsystem improvements for parsing form-like documents May 16, 2025 Form Language Modeling
Code Code Available 0Unveiling the Black Box: A Multi-Layer Framework for Explaining Reinforcement Learning-Based Cyber Agents May 16, 2025 CyberBattleSim Reinforcement Learning (RL)
— Unverified 0Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs May 16, 2025 Mathematical Problem-Solving Reinforcement Learning (RL)
— Unverified 0Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes May 16, 2025 Reinforcement Learning (RL)
— Unverified 0ShiQ: Bringing back Bellman to LLMs May 16, 2025 Q-Learning Reinforcement Learning (RL)
— Unverified 0Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions May 16, 2025 Reinforcement Learning (RL)
— Unverified 0Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM May 16, 2025 Language Modeling Language Modelling
Code Code Available 0Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL May 16, 2025 Reinforcement Learning (RL)
Code Code Available 0Developing and Integrating Trust Modeling into Multi-Objective Reinforcement Learning for Intelligent Agricultural Management May 16, 2025 Management Multi-Objective Reinforcement Learning
— Unverified 0Attention-Based Reward Shaping for Sparse and Delayed Rewards May 16, 2025 Reinforcement Learning (RL)
Code Code Available 0Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design May 16, 2025 Reinforcement Learning (RL)
— Unverified 0Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO May 16, 2025 All Diversity
— Unverified 0Time-R1: Towards Comprehensive Temporal Reasoning in LLMs May 16, 2025 Question Answering Reinforcement Learning (RL)
Code Code Available 0Reinforcement Learning Finetunes Small Subnetworks in Large Language Models May 16, 2025 reinforcement-learning Reinforcement Learning
— Unverified 0Efficient Adaptation of Reinforcement Learning Agents to Sudden Environmental Change May 15, 2025 Decision Making Deep Reinforcement Learning
— Unverified 0Fine-tuning Diffusion Policies with Backpropagation Through Diffusion Timesteps May 15, 2025 Autonomous Driving Denoising
— Unverified 0IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning May 15, 2025 Efficient Exploration Imitation Learning
Code Code Available 0Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models May 15, 2025 Code Generation GSM8K
— Unverified 0Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation May 15, 2025 Reinforcement Learning (RL) Transfer Learning
— Unverified 0TensorRL-QAS: Reinforcement learning with tensor networks for scalable quantum architecture search May 14, 2025 Reinforcement Learning (RL) Tensor Networks
— Unverified 0Reinforcement Learning for Individual Optimal Policy from Heterogeneous Data May 14, 2025 Offline RL reinforcement-learning
— Unverified 0Risk-Aware Safe Reinforcement Learning for Control of Stochastic Linear Systems May 14, 2025 Reinforcement Learning (RL) Safe Reinforcement Learning
— Unverified 0CEC-Zero: Chinese Error Correction Solution Based on LLM May 14, 2025 Domain Generalization Reinforcement Learning (RL)
— Unverified 0Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning May 13, 2025 Deep Reinforcement Learning Intrusion Detection
— Unverified 0Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning May 13, 2025 Meta-Learning Reinforcement Learning (RL)
— Unverified 0Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles May 13, 2025 Autonomous Vehicles GPU
— Unverified 0