SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 15511575 of 1918 papers

TitleStatusHype
A Generalized Minimax Q-learning Algorithm for Two-Player Zero-Sum Stochastic Games0
Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle0
Variance-reduced Q-learning is minimax optimal0
Deep Reinforcement Learning with Discrete Normalized Advantage Functions for Resource Management in Network Slicing0
"Did You Hear That?" Learning to Play Video Games from Audio Cues0
Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the PastCode1
Escaping the State of Nature: A Hobbesian Approach to Cooperation in Multi-agent Reinforcement Learning0
Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning0
Risk-Sensitive Compact Decision Trees for Autonomous Execution in Presence of Simulated Market Response0
Deep Q-Learning for Directed Acyclic Graph Generation0
On-board Deep Q-Network for UAV-assisted Online Power Transfer and Data Collection0
Reinforcement Learning with Low-Complexity Liquid State MachinesCode0
Stabilizing Off-Policy Q-Learning via Bootstrapping Error ReductionCode0
Feature-Based Q-Learning for Two-Player Stochastic Games0
RSS-Based Q-Learning for Indoor UAV Navigation0
Provably Efficient Q-Learning with Low Switching Cost0
Learning NP-Hard Multi-Agent Assignment Planning using GNN: Inference on a Random Graph and Provable Auction-Fitted Q-learning0
Reinforcement Learning for Slate-based Recommender Systems: A Tractable Decomposition and Practical Methodology0
A General Markov Decision Process Framework for Directly Learning Optimal Control Policies0
Solving NP-Hard Problems on Graphs with Extended AlphaGo ZeroCode0
Finite-Sample Analysis of Nonlinear Stochastic Approximation with Applications in Reinforcement LearningCode0
SQIL: Imitation Learning via Reinforcement Learning with Sparse RewardsCode1
Prioritized Sequence Experience Replay0
A Kernel Loss for Solving the Bellman EquationCode0
MQLV: Optimal Policy of Money Management in Retail Banking with Q-Learning0
Show:102550
← PrevPage 63 of 77Next →

No leaderboard results yet.