| Regret of exploratory policy improvement and q-learning | Nov 2, 2024 | Q-Learning | —Unverified | 0 |
| HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search | Nov 1, 2024 | Q-Learning | —Unverified | 0 |
| Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis | Oct 31, 2024 | Q-Learning | CodeCode Available | 0 |
| Zonal RL-RRT: Integrated RL-RRT Path Planning with Collision Probability and Zone Connectivity | Oct 31, 2024 | MuJoCoQ-Learning | CodeCode Available | 1 |
| Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation | Oct 30, 2024 | Offline RLQ-Learning | —Unverified | 0 |
| Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem | Oct 29, 2024 | Q-LearningStochastic Optimization | —Unverified | 0 |
| Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency model | Oct 27, 2024 | D4RLQ-Learning | CodeCode Available | 0 |
| Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes | Oct 23, 2024 | ManagementQ-Learning | —Unverified | 0 |
| A Novel Reinforcement Learning Model for Post-Incident Malware Investigations | Oct 19, 2024 | Malware DetectionQ-Learning | —Unverified | 0 |
| Streaming Deep Reinforcement Learning Finally Works | Oct 18, 2024 | Atari GamesDeep Reinforcement Learning | CodeCode Available | 3 |
| Reward-free World Models for Online Imitation Learning | Oct 17, 2024 | Imitation LearningQ-Learning | CodeCode Available | 1 |
| Multi-Objective-Optimization Multi-AUV Assisted Data Collection Framework for IoUT Based on Offline Reinforcement Learning | Oct 15, 2024 | Collision AvoidanceOffline RL | —Unverified | 0 |
| MFC-EQ: Mean-Field Control with Envelope Q-Learning for Moving Decentralized Agents in Formation | Oct 15, 2024 | Multi-Agent Path FindingQ-Learning | —Unverified | 0 |
| Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space | Oct 15, 2024 | Autonomous VehiclesQ-Learning | —Unverified | 0 |
| Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search | Oct 15, 2024 | Q-Learning | —Unverified | 0 |
| DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation | Oct 15, 2024 | Decision MakingOffline RL | —Unverified | 0 |
| Diffusion-Based Offline RL for Improved Decision-Making in Augmented ARC Task | Oct 15, 2024 | ARCDecision Making | —Unverified | 0 |
| Asymptotic Analysis of Sample-averaged Q-learning | Oct 14, 2024 | OpenAI GymQ-Learning | —Unverified | 0 |
| Online waveform selection for cognitive radar | Oct 14, 2024 | Q-Learning | —Unverified | 0 |
| Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving | Oct 11, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| UNIQ: Offline Inverse Q-learning for Avoiding Undesirable Demonstrations | Oct 10, 2024 | Imitation LearningQ-Learning | CodeCode Available | 0 |
| Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition | Oct 10, 2024 | Q-Learning | —Unverified | 0 |
| VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers | Oct 10, 2024 | Mathematical ReasoningQ-Learning | —Unverified | 0 |
| Optimized Resource Allocation for Cloud-Native 6G Networks: Zero-Touch ML Models in Microservices-based VNF Deployments | Oct 9, 2024 | ManagementQ-Learning | —Unverified | 0 |
| Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming | Oct 9, 2024 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 |
| Learning in complex action spaces without policy gradients | Oct 8, 2024 | Policy Gradient MethodsQ-Learning | —Unverified | 0 |
| Reinforcenment Learning-Aided NOMA Random Access: An AoI-Based Timeliness Perspective | Oct 4, 2024 | Q-Learning | —Unverified | 0 |
| Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning | Oct 2, 2024 | Decision MakingQ-Learning | —Unverified | 0 |
| Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages | Sep 27, 2024 | Q-LearningScheduling | —Unverified | 0 |
| Reinforcement Learning for Finite Space Mean-Field Type Games | Sep 25, 2024 | Deep Reinforcement LearningQ-Learning | —Unverified | 0 |
| Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment | Sep 25, 2024 | Decision MakingQ-Learning | —Unverified | 0 |
| Agent-state based policies in POMDPs: Beyond belief-state MDPs | Sep 24, 2024 | Q-Learning | —Unverified | 0 |
| A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network Optimization | Sep 24, 2024 | Q-Learning | CodeCode Available | 0 |
| Learning to Play Video Games with Intuitive Physics Priors | Sep 20, 2024 | Decision MakingObject | —Unverified | 0 |
| Data-Efficient Quadratic Q-Learning Using LMIs | Sep 18, 2024 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 |
| Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning | Sep 17, 2024 | Deep Reinforcement LearningQ-Learning | —Unverified | 0 |
| Offline Reinforcement Learning for Learning to Dispatch for Job Shop Scheduling | Sep 16, 2024 | Combinatorial Optimizationcounterfactual | CodeCode Available | 0 |
| Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic Environments | Sep 16, 2024 | Audio Signal ProcessingDeep Reinforcement Learning | CodeCode Available | 0 |
| SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning | Sep 16, 2024 | Deep Reinforcement LearningOptical Flow Estimation | —Unverified | 0 |
| KAN v.s. MLP for Offline Reinforcement Learning | Sep 15, 2024 | D4RLKolmogorov-Arnold Networks | —Unverified | 0 |
| Autonomous Vehicle Decision-Making Framework for Considering Malicious Behavior at Unsignalized Intersections | Sep 11, 2024 | Autonomous VehiclesDecision Making | —Unverified | 0 |
| Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning | Sep 10, 2024 | Deep Reinforcement LearningOpenAI Gym | CodeCode Available | 0 |
| Reinforcement Learning for Rate Maximization in IRS-aided OWC Networks | Sep 7, 2024 | Q-Learningreinforcement-learning | —Unverified | 0 |
| Reward-Directed Score-Based Diffusion Models via q-Learning | Sep 7, 2024 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 |
| Faster Q-Learning Algorithms for Restless Bandits | Sep 6, 2024 | Multi-Armed BanditsQ-Learning | —Unverified | 0 |
| Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes | Sep 6, 2024 | Multi-Armed BanditsQ-Learning | —Unverified | 0 |
| On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments | Sep 5, 2024 | Q-Learning | —Unverified | 0 |
| Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning | Sep 5, 2024 | Q-Learningreinforcement-learning | —Unverified | 0 |
| Robust Q-Learning under Corrupted Rewards | Sep 5, 2024 | Q-Learning | CodeCode Available | 0 |
| Reinforcement Learning-enabled Satellite Constellation Reconfiguration and Retasking for Mission-Critical Applications | Sep 3, 2024 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 |