| Untangling Braids with Multi-agent Q-Learning | Sep 29, 2021 | OpenAI GymQ-Learning | —Unverified | 0 | 0 |
| Urban traffic dynamic rerouting framework: A DRL-based model with fog-cloud architecture | Oct 11, 2021 | Graph AttentionQ-Learning | —Unverified | 0 | 0 |
| User Tampering in Reinforcement Learning Recommender Systems | Sep 9, 2021 | Q-LearningRecommendation Systems | —Unverified | 0 | 0 |
| Using a Deep Reinforcement Learning Agent for Traffic Signal Control | Nov 3, 2016 | Deep Reinforcement LearningQ-Learning | —Unverified | 0 | 0 |
| Using Deep Q-Learning to Control Optimization Hyperparameters | Feb 12, 2016 | Q-LearningReinforcement Learning | —Unverified | 0 | 0 |
| Using Deep Q-Learning to Dynamically Toggle between Push/Pull Actions in Computational Trust Mechanisms | Apr 28, 2024 | Q-Learning | —Unverified | 0 | 0 |
| Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners | Sep 5, 2020 | Q-Learning | —Unverified | 0 | 0 |
| Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution | Jun 29, 2020 | Q-Learningreinforcement-learning | —Unverified | 0 | 0 |
| Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents | Oct 2, 2023 | Q-Learning | —Unverified | 0 | 0 |
| Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning | Mar 31, 2024 | Atari GamesQ-Learning | —Unverified | 0 | 0 |
| VA-learning as a more efficient alternative to Q-learning | May 29, 2023 | Q-Learning | —Unverified | 0 | 0 |
| Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings | Jul 28, 2021 | continuous-controlContinuous Control | —Unverified | 0 | 0 |
| Value function interference and greedy action selection in value-based multi-objective reinforcement learning | Feb 9, 2024 | Multi-Objective Reinforcement LearningQ-Learning | —Unverified | 0 | 0 |
| Value-of-Information based Arbitration between Model-based and Model-free Control | Dec 8, 2019 | Computational Efficiencymodel | —Unverified | 0 | 0 |
| Value Penalized Q-Learning for Recommender Systems | Oct 15, 2021 | Offline RLQ-Learning | —Unverified | 0 | 0 |
| Value Refinement Network (VRN) | Sep 29, 2021 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm | Jun 17, 2023 | Atari GamesQ-Learning | —Unverified | 0 | 0 |
| Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity | Aug 13, 2024 | Q-Learning | —Unverified | 0 | 0 |
| Variance-reduced Q-learning is minimax optimal | Jun 11, 2019 | Q-Learning | —Unverified | 0 | 0 |
| Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient | Jul 25, 2020 | Q-Learningreinforcement-learning | —Unverified | 0 | 0 |
| Variance Reduction Methods for Sublinear Reinforcement Learning | Feb 26, 2018 | Q-Learningreinforcement-learning | —Unverified | 0 | 0 |
| Variational Bayesian Reinforcement Learning with Regret Bounds | Jul 25, 2018 | Q-Learningreinforcement-learning | —Unverified | 0 | 0 |
| Variational quantum compiling with double Q-learning | Mar 22, 2021 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Vehicle management in a modular production context using Deep Q-Learning | May 6, 2022 | Deep Reinforcement LearningJob Shop Scheduling | —Unverified | 0 | 0 |
| Verification of Dissipativity and Evaluation of Storage Function in Economic Nonlinear MPC using Q-Learning | May 24, 2021 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers | Oct 10, 2024 | Mathematical ReasoningQ-Learning | —Unverified | 0 | 0 |
| Video Summarisation by Classification with Deep Reinforcement Learning | Jul 9, 2018 | ClassificationDecision Making | —Unverified | 0 | 0 |
| Virtual Autonomous Driving with Reinforcement Learning | Dec 14, 2020 | Autonomous DrivingQ-Learning | —Unverified | 0 | 0 |
| VistaFlow: Photorealistic Volumetric Reconstruction with Dynamic Resolution Management via Q-Learning | Feb 5, 2025 | CPUManagement | —Unverified | 0 | 0 |
| Visual Radial Basis Q-Network | Jun 14, 2022 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling | Jan 3, 2018 | Q-LearningReinforcement Learning | —Unverified | 0 | 0 |
| V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL | Oct 27, 2021 | Medical Visual Question AnsweringQ-Learning | —Unverified | 0 | 0 |
| VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making | May 6, 2025 | Decision MakingGeneral Knowledge | —Unverified | 0 | 0 |
| VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation | Dec 12, 2022 | Q-Learningregression | —Unverified | 0 | 0 |
| Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control | Mar 4, 2023 | MuJoCoQ-Learning | —Unverified | 0 | 0 |
| Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog | Jun 30, 2019 | Deep Reinforcement LearningOpen-Domain Dialog | —Unverified | 0 | 0 |
| Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog | Jan 1, 2020 | Deep Reinforcement LearningOpenAI Gym | —Unverified | 0 | 0 |
| Weakly Coupled Deep Q-Networks | Oct 28, 2023 | Deep Reinforcement LearningQ-Learning | —Unverified | 0 | 0 |
| Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates | Jan 1, 2021 | Deep Reinforcement LearningQ-Learning | —Unverified | 0 | 0 |
| Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments | Feb 23, 2018 | Deep Reinforcement LearningQ-Learning | —Unverified | 0 | 0 |
| "What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended) | Jun 11, 2025 | DiversityQ-Learning | —Unverified | 0 | 0 |
| What Would pi* Do?: Imitation Learning via Off-Policy Reinforcement Learning | Sep 27, 2018 | Imitation LearningQ-Learning | —Unverified | 0 | 0 |
| Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs | Oct 13, 2023 | Decision MakingMulti-Armed Bandits | —Unverified | 0 | 0 |
| When a Reinforcement Learning Agent Encounters Unknown Unknowns | May 19, 2025 | AI AgentQ-Learning | —Unverified | 0 | 0 |
| When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms | May 23, 2018 | Efficient ExplorationQ-Learning | —Unverified | 0 | 0 |
| Where to Look: A Unified Attention Model for Visual Recognition with Reinforcement Learning | Nov 13, 2021 | Q-LearningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using DeepReinforcement Learning | Nov 24, 2019 | ChatbotDeep Reinforcement Learning | —Unverified | 0 | 0 |
| Whittle index based Q-learning for restless bandits with average reward | Apr 29, 2020 | Q-Learningreinforcement-learning | —Unverified | 0 | 0 |
| Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes | Sep 6, 2024 | Multi-Armed BanditsQ-Learning | —Unverified | 0 | 0 |
| Whittle's index-based age-of-information minimization in multi-energy harvesting source networks | Aug 5, 2024 | Q-LearningScheduling | —Unverified | 0 | 0 |