SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 51100 of 1918 papers

TitleStatusHype
Robust Q-learning Algorithm for Markov Decision Processes under Wasserstein UncertaintyCode1
Revisiting Discrete Soft Actor-CriticCode1
MAN: Multi-Action Networks LearningCode1
A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari GamesCode1
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement LearningCode1
Reinforced Lin-Kernighan-Helsgaun Algorithms for the Traveling Salesman ProblemsCode1
On the Learning and Learnability of QuasimetricsCode1
MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay BufferCode1
Sampling Efficient Deep Reinforcement Learning through Preference-Guided Stochastic ExplorationCode1
A Search-Based Testing Approach for Deep Reinforcement Learning AgentsCode1
Mildly Conservative Q-Learning for Offline Reinforcement LearningCode1
CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement LearningCode1
GAIL-PT: A Generic Intelligent Penetration Testing Framework with Generative Adversarial Imitation LearningCode1
Microservice Deployment in Edge Computing Based on Deep Q LearningCode1
Addressing Maximization Bias in Reinforcement Learning with Two-Sample TestingCode1
Safety and Liveness Guarantees through Reach-Avoid Reinforcement LearningCode1
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical PerspectivesCode1
Regularized Softmax Deep Multi-Agent Q-LearningCode1
Offline Reinforcement Learning with Implicit Q-LearningCode1
Dropout Q-Functions for Doubly Efficient Reinforcement LearningCode1
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-EnsembleCode1
Learning the Markov Decision Process in the Sparse Gaussian EliminationCode1
Offline Reinforcement Learning with In-sample Q-LearningCode1
Deep Reinforcement Q-Learning for Intelligent Traffic Signal Control with Partial DetectionCode1
Backprop-Free Reinforcement Learning with Active Neural Generative CodingCode1
Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data AugmentationCode1
Distilling Reinforcement Learning Tricks for Video GamesCode1
Towards self-organized control: Using neural cellular automata to robustly control a cart-pole agentCode1
Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via DiscretisationCode1
IQ-Learn: Inverse soft-Q Learning for ImitationCode1
Distributed Heuristic Multi-Agent Path Finding with CommunicationCode1
Efficient (Soft) Q-Learning for Text Generation with Limited Good DataCode1
TempoRL: Learning When to ActCode1
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement LearningCode1
SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-LearningCode1
Uncertainty Weighted Actor-Critic for Offline Reinforcement LearningCode1
HASCO: Towards Agile HArdware and Software CO-design for Tensor ComputationCode1
Optimal Market Making by Reinforcement LearningCode1
DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-LearningCode1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19Code1
Acting in Delayed Environments with Non-Stationary Markov PoliciesCode1
Randomized Ensembled Double Q-Learning: Learning Fast Without a ModelCode1
Simulating SQL Injection Vulnerability Exploitation Using Q-Learning Reinforcement Learning AgentsCode1
Multi-Agent Trust Region LearningCode1
Combining Reinforcement Learning with Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman ProblemCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Adaptive Contention Window Design using Deep Q-learningCode1
Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous ControlsCode1
Learning Guidance Rewards with Trajectory-space SmoothingCode1
Multi-Agent Collaboration via Reward Attribution DecompositionCode1
Show:102550
← PrevPage 2 of 39Next →

No leaderboard results yet.