SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 101150 of 1918 papers

TitleStatusHype
Deep Active Inference for Partially Observable MDPsCode1
Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of TrialsCode1
Towards Universal and Black-Box Query-Response Only Attack on LLMs with QROACode1
Randomized Ensembled Double Q-Learning: Learning Fast Without a ModelCode1
Reasoning with Latent Diffusion in Offline Reinforcement LearningCode1
Regularized Softmax Deep Multi-Agent Q-LearningCode1
Research on Robot Path Planning Based on Reinforcement LearningCode1
CCLF: A Contrastive-Curiosity-Driven Learning Framework for Sample-Efficient Reinforcement LearningCode1
Boosting Soft Actor-Critic: Emphasizing Recent Experience without Forgetting the PastCode1
Reward Machines for Cooperative Multi-Agent Reinforcement LearningCode1
Adaptive Contention Window Design using Deep Q-learningCode1
Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via DiscretisationCode1
Sampling Efficient Deep Reinforcement Learning through Preference-Guided Stochastic ExplorationCode1
A Search-Based Testing Approach for Deep Reinforcement Learning AgentsCode1
ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical PerspectivesCode1
Backprop-Free Reinforcement Learning with Active Neural Generative CodingCode1
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic ActorCode1
Solving Continuous Control via Q-learningCode1
Stabilising Experience Replay for Deep Multi-Agent Reinforcement LearningCode1
Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data AugmentationCode1
Automated Cloud Provisioning on AWS using Deep Reinforcement LearningCode1
An Optimistic Perspective on Offline Reinforcement LearningCode1
Table2Charts: Recommending Charts by Learning Shared Table RepresentationsCode1
TempoRL: Learning When to ActCode1
Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-errorCode1
Towards Robust Offline Reinforcement Learning under Diverse Data CorruptionCode1
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement LearningCode1
A Recipe for Unbounded Data Augmentation in Visual Reinforcement LearningCode1
An Optimistic Perspective on Offline Deep Reinforcement LearningCode1
A Stochastic Game Framework for Efficient Energy Management in Microgrid NetworksCode1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19Code1
Boosting Continuous Control with Consistency PolicyCode1
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-TuningCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
Benchmarking Batch Deep Reinforcement Learning AlgorithmsCode1
Combining Reinforcement Learning with Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman ProblemCode1
Addressing Function Approximation Error in Actor-Critic MethodsCode1
Continuous Deep Q-Learning with Model-based AccelerationCode1
Deep Inverse Q-learning with ConstraintsCode1
FACMAC: Factored Multi-Agent Centralised Policy GradientsCode1
Deep Reinforcement Learning-based Intelligent Traffic Signal Controls with Optimized CO2 emissionsCode1
Deep Reinforcement Learning with Double Q-learningCode1
When should we prefer Decision Transformers for Offline Reinforcement Learning?Code1
Diffusion Policies creating a Trust Region for Offline Reinforcement LearningCode1
Discriminator Soft Actor Critic without Extrinsic RewardsCode1
Distilling Reinforcement Learning Tricks for Video GamesCode1
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion PoliciesCode1
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value RegularizationCode1
EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological ModelsCode1
Uncertainty Weighted Actor-Critic for Offline Reinforcement LearningCode1
Show:102550
← PrevPage 3 of 39Next →

No leaderboard results yet.