SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 101125 of 1918 papers

TitleStatusHype
Q-learning with Language Model for Edit-based Unsupervised SummarizationCode1
EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological ModelsCode1
Energy-based Surprise Minimization for Multi-Agent Value FactorizationCode1
Deep Active Inference for Partially Observable MDPsCode1
Table2Charts: Recommending Charts by Learning Shared Table RepresentationsCode1
Robust Deep Reinforcement Learning through Adversarial LossCode1
Deep Inverse Q-learning with ConstraintsCode1
QPLEX: Duplex Dueling Multi-Agent Q-LearningCode1
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement LearningCode1
Neural Interactive Collaborative FilteringCode1
Reward Machines for Cooperative Multi-Agent Reinforcement LearningCode1
Gradient Temporal-Difference Learning with Regularized CorrectionsCode1
Image Classification by Reinforcement Learning with Two-State Q-LearningCode1
Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement LearningCode1
Semantic Visual Navigation by Watching YouTube VideosCode1
Conservative Q-Learning for Offline Reinforcement LearningCode1
Multi-Agent Determinantal Q-LearningCode1
Modeling Penetration Testing with Reinforcement Learning Using Capture-the-Flag Challenges: Trade-offs between Model-free Learning and A Priori KnowledgeCode1
Spatial Action Maps for Mobile ManipulationCode1
Using Deep Reinforcement Learning Methods for Autonomous Vessels in 2D EnvironmentsCode1
FlapAI Bird: Training an Agent to Play Flappy Bird Using Reinforcement Learning TechniquesCode1
DisCor: Corrective Feedback in Reinforcement Learning via Distribution CorrectionCode1
FACMAC: Factored Multi-Agent Centralised Policy GradientsCode1
Optimistic Exploration even with a Pessimistic InitialisationCode1
Maxmin Q-learning: Controlling the Estimation Bias of Q-learningCode1
Show:102550
← PrevPage 5 of 77Next →

No leaderboard results yet.