SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 101150 of 1918 papers

TitleStatusHype
Extreme Q-Learning: MaxEnt RL without EntropyCode1
Research on Robot Path Planning Based on Reinforcement LearningCode1
Evolution Strategies as a Scalable Alternative to Reinforcement LearningCode1
FACMAC: Factored Multi-Agent Centralised Policy GradientsCode1
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised LearningCode1
Robust Deep Reinforcement Learning through Adversarial LossCode1
Energy-based Surprise Minimization for Multi-Agent Value FactorizationCode1
Safety and Liveness Guarantees through Reach-Avoid Reinforcement LearningCode1
A Search-Based Testing Approach for Deep Reinforcement Learning AgentsCode1
Semantic Visual Navigation by Watching YouTube VideosCode1
Adaptive Contention Window Design using Deep Q-learningCode1
Deep Reinforcement Learning-based Intelligent Traffic Signal Controls with Optimized CO2 emissionsCode1
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic ActorCode1
Solving Continuous Control via Q-learningCode1
SQIL: Imitation Learning via Reinforcement Learning with Sparse RewardsCode1
Stabilising Experience Replay for Deep Multi-Agent Reinforcement LearningCode1
Strategically Conservative Q-LearningCode1
Distributed Heuristic Multi-Agent Path Finding with CommunicationCode1
Deep Reinforcement Learning with Double Q-learningCode1
DisCor: Corrective Feedback in Reinforcement Learning via Distribution CorrectionCode1
Deep Reinforcement Q-Learning for Intelligent Traffic Signal Control with Partial DetectionCode1
TempoRL: Learning When to ActCode1
DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-LearningCode1
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-TuningCode1
Diffusion Policies creating a Trust Region for Offline Reinforcement LearningCode1
Dropout Q-Functions for Doubly Efficient Reinforcement LearningCode1
GAIL-PT: A Generic Intelligent Penetration Testing Framework with Generative Adversarial Imitation LearningCode1
Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement LearningCode1
An Optimistic Perspective on Offline Deep Reinforcement LearningCode1
Multi-Agent Reinforcement Learning via Distributed MPC as a Function ApproximatorCode1
A Recipe for Unbounded Data Augmentation in Visual Reinforcement LearningCode1
EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological ModelsCode1
Towards Universal and Black-Box Query-Response Only Attack on LLMs with QROACode1
FlapAI Bird: Training an Agent to Play Flappy Bird Using Reinforcement Learning TechniquesCode1
Gradient Temporal-Difference Learning with Regularized CorrectionsCode1
A Stochastic Game Framework for Efficient Energy Management in Microgrid NetworksCode1
Addressing Function Approximation Error in Actor-Critic MethodsCode1
HASCO: Towards Agile HArdware and Software CO-design for Tensor ComputationCode1
Automated Cloud Provisioning on AWS using Deep Reinforcement LearningCode1
Backprop-Free Reinforcement Learning with Active Neural Generative CodingCode1
Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?Code1
IQ-Learn: Inverse soft-Q Learning for ImitationCode1
When should we prefer Decision Transformers for Offline Reinforcement Learning?Code1
Benchmarking Batch Deep Reinforcement Learning AlgorithmsCode1
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement LearningCode1
Learning the Markov Decision Process in the Sparse Gaussian EliminationCode1
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19Code1
Boosting Continuous Control with Consistency PolicyCode1
MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay BufferCode1
Uncertainty Weighted Actor-Critic for Offline Reinforcement LearningCode1
Show:102550
← PrevPage 3 of 39Next →

No leaderboard results yet.