SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 751775 of 1918 papers

TitleStatusHype
Goal-Conditioned Q-Learning as Knowledge DistillationCode0
Object Goal Navigation using Data Regularized Q-Learning0
Prospect Theory-inspired Automated P2P Energy Trading with Q-learning-based Dynamic Pricing0
Recurrent Neural Network-based Anti-jamming Framework for Defense Against Multiple Jamming Policies0
Diffusion Policies as an Expressive Policy Class for Offline Reinforcement LearningCode2
A Novel Resource Allocation for Anti-jamming in Cognitive-UAVs: an Active Inference Approach0
Compositional Reinforcement Learning for Discrete-Time Stochastic Control Systems0
Reinforcement Learning for Joint V2I Network Selection and Autonomous Driving Policies0
Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task Scheduling0
Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction ApproachCode0
A Maintenance Planning Framework using Online and Offline Deep Reinforcement Learning0
Playing a 2D Game Indefinitely using NEAT and Reinforcement Learning0
Structural Similarity for Improved Transfer in Reinforcement Learning0
Finite-Time Analysis of Asynchronous Q-learning under Diminishing Step-Size from Control-Theoretic View0
Multi-Source AoI-Constrained Resource Minimization under HARQ: Heterogeneous Sampling Processes0
On Decentralizing Federated Reinforcement Learning in Multi-Robot Scenarios0
A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari GamesCode1
DDPG Learning for Aerial RIS-Assisted MU-MISO Communications0
Approximate Nash Equilibrium Learning for n-Player Markov Games in Dynamic Pricing0
Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement LearningCode1
Reinforced Lin-Kernighan-Helsgaun Algorithms for the Traveling Salesman ProblemsCode1
Multi-objective Optimization of Notifications Using Offline Reinforcement Learning0
Planning with RL and episodic-memory behavioral priors0
q-Learning in Continuous Time0
Interactive Learning from Natural Language and Demonstrations using Signal Temporal Logic0
Show:102550
← PrevPage 31 of 77Next →

No leaderboard results yet.