SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 14011450 of 1918 papers

TitleStatusHype
Untangling Braids with Multi-agent Q-Learning0
Urban traffic dynamic rerouting framework: A DRL-based model with fog-cloud architecture0
User Tampering in Reinforcement Learning Recommender Systems0
Using a Deep Reinforcement Learning Agent for Traffic Signal Control0
Using Deep Q-Learning to Control Optimization Hyperparameters0
Using Deep Q-Learning to Dynamically Toggle between Push/Pull Actions in Computational Trust Mechanisms0
Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners0
Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution0
Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents0
Utilizing Maximum Mean Discrepancy Barycenter for Propagating the Uncertainty of Value Functions in Reinforcement Learning0
VA-learning as a more efficient alternative to Q-learning0
Value-Based Reinforcement Learning for Continuous Control Robotic Manipulation in Multi-Task Sparse Reward Settings0
Value function interference and greedy action selection in value-based multi-objective reinforcement learning0
Value-of-Information based Arbitration between Model-based and Model-free Control0
Value Penalized Q-Learning for Recommender Systems0
Value Refinement Network (VRN)0
Vanishing Bias Heuristic-guided Reinforcement Learning Algorithm0
Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity0
Variance-reduced Q-learning is minimax optimal0
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient0
Variance Reduction Methods for Sublinear Reinforcement Learning0
Variational Bayesian Reinforcement Learning with Regret Bounds0
Variational quantum compiling with double Q-learning0
Vehicle management in a modular production context using Deep Q-Learning0
Verification of Dissipativity and Evaluation of Storage Function in Economic Nonlinear MPC using Q-Learning0
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers0
Video Summarisation by Classification with Deep Reinforcement Learning0
Virtual Autonomous Driving with Reinforcement Learning0
VistaFlow: Photorealistic Volumetric Reconstruction with Dynamic Resolution Management via Q-Learning0
Visual Radial Basis Q-Network0
ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling0
V-Learning -- A Simple, Efficient, Decentralized Algorithm for Multiagent RL0
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making0
VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation0
Wasserstein Actor-Critic: Directed Exploration via Optimism for Continuous-Actions Control0
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog0
Way Off-Policy Batch Deep Reinforcement Learning of Human Preferences in Dialog0
Weakly Coupled Deep Q-Networks0
Weighted Bellman Backups for Improved Signal-to-Noise in Q-Updates0
Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments0
"What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)0
What Would pi* Do?: Imitation Learning via Off-Policy Reinforcement Learning0
Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs0
When a Reinforcement Learning Agent Encounters Unknown Unknowns0
When Simple Exploration is Sample Efficient: Identifying Sufficient Conditions for Random Exploration to Yield PAC RL Algorithms0
Where to Look: A Unified Attention Model for Visual Recognition with Reinforcement Learning0
Which Channel to Ask My Question? Personalized Customer Service RequestStream Routing using DeepReinforcement Learning0
Whittle index based Q-learning for restless bandits with average reward0
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes0
Whittle's index-based age-of-information minimization in multi-energy harvesting source networks0
Show:102550
← PrevPage 29 of 39Next →

No leaderboard results yet.