SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 18011825 of 1918 papers

TitleStatusHype
Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal ControlCode0
Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial PuzzlesCode0
QFlip: An Adaptive Reinforcement Learning Strategy for the FlipIt Security GameCode0
Decision Making in Non-Stationary Environments with Policy-Augmented SearchCode0
Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action TasksCode0
Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill DiscoveryCode0
Combining No-regret and Q-learningCode0
Playing Doom with SLAM-Augmented Deep Reinforcement LearningCode0
Hierarchical Reinforcement Learning with the MAXQ Value Function DecompositionCode0
Playing FPS Games with Deep Reinforcement LearningCode0
Regularized Q-learning through Robust AveragingCode0
Policy Learning for Malaria ControlCode0
A DQN-based Approach to Finding Precise Evidences for Fact VerificationCode0
EASpace: Enhanced Action Space for Policy TransferCode0
Belief-Enriched Pessimistic Q-Learning against Adversarial State PerturbationsCode0
A Statistical Analysis of Polyak-Ruppert Averaged Q-learningCode0
Augmented Q Imitation Learning (AQIL)Code0
Superior Genetic Algorithms for the Target Set Selection Problem Based on Power-Law Parameter Choices and Simple Greedy HeuristicsCode0
CytonRL: an Efficient Reinforcement Learning Open-source Toolkit Implemented in C++Code0
Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy MethodsCode0
Combinational Q-Learning for Dou Di ZhuCode0
POPO: Pessimistic Offline Policy OptimizationCode0
Crowd Intelligence for Early Misinformation Prediction on Social MediaCode0
Deep Reinforcement Learning for Traffic Light Control in Vehicular NetworksCode0
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy ImprovementCode0
Show:102550
← PrevPage 73 of 77Next →

No leaderboard results yet.