SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 18011850 of 1918 papers

TitleStatusHype
Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal ControlCode0
Heuristics, Answer Set Programming and Markov Decision Process for Solving a Set of Spatial PuzzlesCode0
QFlip: An Adaptive Reinforcement Learning Strategy for the FlipIt Security GameCode0
Decision Making in Non-Stationary Environments with Policy-Augmented SearchCode0
Action Candidate Based Clipped Double Q-learning for Discrete and Continuous Action TasksCode0
Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill DiscoveryCode0
Combining No-regret and Q-learningCode0
Playing Doom with SLAM-Augmented Deep Reinforcement LearningCode0
Hierarchical Reinforcement Learning with the MAXQ Value Function DecompositionCode0
Playing FPS Games with Deep Reinforcement LearningCode0
Regularized Q-learning through Robust AveragingCode0
Policy Learning for Malaria ControlCode0
A DQN-based Approach to Finding Precise Evidences for Fact VerificationCode0
EASpace: Enhanced Action Space for Policy TransferCode0
Belief-Enriched Pessimistic Q-Learning against Adversarial State PerturbationsCode0
A Statistical Analysis of Polyak-Ruppert Averaged Q-learningCode0
Augmented Q Imitation Learning (AQIL)Code0
Superior Genetic Algorithms for the Target Set Selection Problem Based on Power-Law Parameter Choices and Simple Greedy HeuristicsCode0
CytonRL: an Efficient Reinforcement Learning Open-source Toolkit Implemented in C++Code0
Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy MethodsCode0
Combinational Q-Learning for Dou Di ZhuCode0
POPO: Pessimistic Offline Policy OptimizationCode0
Crowd Intelligence for Early Misinformation Prediction on Social MediaCode0
Deep Reinforcement Learning for Traffic Light Control in Vehicular NetworksCode0
Greedy Actor-Critic: A New Conditional Cross-Entropy Method for Policy ImprovementCode0
Deep reinforcement learning for time series: playing idealized trading gamesCode0
Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement LearningCode0
Robotic Surgery With Lean Reinforcement LearningCode0
Practical Block-wise Neural Network Architecture GenerationCode0
Implications of Decentralized Q-learning Resource Allocation in Wireless NetworksCode0
Training Transition Policies via Distribution Matching for Complex TasksCode0
Balancing Value Underestimation and Overestimation with Realistic Actor-CriticCode0
Improved robustness of reinforcement learning policies upon conversion to spiking neuronal network platforms applied to ATARI gamesCode0
A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network OptimizationCode0
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking AgentsCode0
Urban Driving with Multi-Objective Deep Reinforcement LearningCode0
Deep Reinforcement Learning for Optimal Stopping with Application in Financial EngineeringCode0
Collaborative Multi-BS Power Management for Dense Radio Access Network using Deep Reinforcement LearningCode0
CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learningCode0
Pre-training with Synthetic Data Helps Offline Reinforcement LearningCode0
Increasing the Action Gap: New Operators for Reinforcement LearningCode0
Understanding algorithmic collusion with experience replayCode0
Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy OptimizationCode0
Information-Directed Exploration for Deep Reinforcement LearningCode0
A disembodied developmental robotic agent called Samu BátfaiCode0
Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic EnvironmentsCode0
Information-Theoretic State Variable Selection for Reinforcement LearningCode0
VQC-Based Reinforcement Learning with Data Re-uploading: Performance and TrainabilityCode0
Mutual Information Regularized Offline Reinforcement LearningCode0
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory SystemsCode0
Show:102550
← PrevPage 37 of 39Next →

No leaderboard results yet.