SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 701750 of 1918 papers

TitleStatusHype
Exploratory Control with Tsallis Entropy for Latent Factor Models0
On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization0
Reinforcement Learning in Non-Markovian Environments0
Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints0
Deep Reinforcement Learning for Power Control in Next-Generation WiFi Network Systems0
DynamicLight: Two-Stage Dynamic Traffic Signal TimingCode0
Quantum deep recurrent reinforcement learning0
Attitude Control of Highly Maneuverable Aircraft Using an Improved Q-learning0
Solving Continuous Control via Q-learningCode1
Sufficient Exploration for Convex Q-learning0
Mutual Information Regularized Offline Reinforcement LearningCode0
Model-Free Characterizations of the Hamilton-Jacobi-Bellman Equation and Convex Q-Learning in Continuous Time0
Deep reinforcement learning for automatic run-time adaptation of UWB PHY radio settings0
Hybrid RL: Using Both Offline and Online Data Can Make RL EfficientCode1
Sustainable Online Reinforcement Learning for Auto-biddingCode1
Censored Deep Reinforcement Patrolling with Information Criterion for Monitoring Large Water Resources using Autonomous Surface Vehicles0
DQLAP: Deep Q-Learning Recommender Algorithm with Update Policy for a Real Steam Turbine System0
Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of TrialsCode1
Factors of Influence of the Overestimation Bias of Q-LearningCode0
Reinforcement Learning Approach for Multi-Agent Flexible Scheduling Problems0
Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement LearningCode0
Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders0
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient0
Bayesian Q-learning With Imperfect Expert Demonstrations0
Deep Recurrent Q-learning for Energy-constrained Coverage with a Mobile Robot0
Application of Deep Q Learning with Simulation Results for Elevator Optimization0
Efficient LSTM Training with Eligibility Traces0
Robust Q-learning Algorithm for Markov Decision Processes under Wasserstein UncertaintyCode1
On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly Communicating MDPs0
Predictive Crypto-Asset Automated Market Making Architecture for Decentralized Finance using Deep Reinforcement Learning0
FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations0
Understanding Hindsight Goal Relabeling from a Divergence Minimization Perspective0
Revisiting Discrete Soft Actor-CriticCode1
MAN: Multi-Action Networks LearningCode1
Comparative Study of Q-Learning and NeuroEvolution of Augmenting Topologies for Self Driving Agents0
MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning0
Reinforcement Learning-Based Cooperative P2P Power Trading between DC Nanogrid Clusters with Wind and PV Energy Resources0
M^2DQN: A Robust Method for Accelerating Deep Q-learning NetworkCode0
IoT-Aerial Base Station Task Offloading with Risk-Sensitive Reinforcement Learning for Smart Agriculture0
Deep Reinforcement Learning for Task Offloading in UAV-Aided Smart Farm Networks0
Structured Q-learning For Antibody Design0
Route Planning for Last-Mile Deliveries Using Mobile Parcel Lockers: A Hybrid Q-Learning Network ApproachCode0
Reward Delay Attacks on Deep Reinforcement LearningCode0
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL0
Double Q-Learning for Citizen Relocation During Natural Hazards0
On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs0
SlateFree: a Model-Free Decomposition for Reinforcement Learning with Slate Actions0
A Technique to Create Weaker Abstract Board Game Agents via Reinforcement Learning0
Partial Counterfactual Identification for Infinite Horizon Partially Observable Markov Decision Process0
Direct Data-Driven Discrete-time Bilinear Biquadratic Regulator0
Show:102550
← PrevPage 15 of 39Next →

No leaderboard results yet.