SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 401450 of 1918 papers

TitleStatusHype
Decentralized Semantic Traffic Control in AVs Using RL and DQN for Dynamic Roadblocks0
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention0
EduQate: Generating Adaptive Curricula through RMABs in Education Settings0
Equivariant Offline Reinforcement Learning0
Learning to Select Goals in Automated Planning with Deep-Q Learning0
A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms0
Reinforcement-Learning based routing for packet-optical networks with hybrid telemetryCode0
Optimal Transport-Assisted Risk-Sensitive Q-Learning0
Catalytic evolution of cooperation in a population with behavioural bimodality0
Finite-Time Analysis of Simultaneous Double Q-learning0
Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning0
Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors0
Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck EquationCode0
Online Frequency Scheduling by Learning Parallel Actions0
Fast-Fading Channel and Power Optimization of the Magnetic Inductive Cellular Network0
Stabilizing Extreme Q-learning by Maclaurin ExpansionCode0
Bootstrapping Expectiles in Reinforcement Learning0
Age of Trust (AoT): A Continuous Verification Framework for Wireless Networks0
Tabular and Deep Learning for the Whittle Index0
Algorithmic Collusion in Dynamic Pricing with Deep Reinforcement Learning0
How to discretize continuous state-action spaces in Q-learning: A symbolic control approach0
Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function ApproximationCode0
Q-learning as a monotone scheme0
Approximate Global Convergence of Independent Learning in Multi-Agent Systems0
Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost0
Imitating from auxiliary imperfect demonstrations via Adversarial Density Weighted RegressionCode0
Mutation-Bias Learning in Games0
Highway Reinforcement Learning0
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained OptimizationCode0
Analysis of Multiscale Reinforcement Q-Learning Algorithms for Mean Field Control Games0
Reinforcement Learning for Jump-Diffusions, with Financial Applications0
An Evolutionary Framework for Connect-4 as Test-Bed for Comparison of Advanced Minimax, Q-Learning and MCTS0
Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine0
Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning0
SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning0
A finite time analysis of distributed Q-learning0
Exclusively Penalized Q-learning for Offline Reinforcement Learning0
Learning To Play Atari Games Using Dueling Q-Learning and Hebbian PlasticityCode0
Stochastic Q-learning for Large Discrete Action Spaces0
Deep Reinforcement Learning for Real-Time Ground Delay Program Revision and Corresponding Flight Delay Assignments0
Smart Sampling: Self-Attention and Bootstrapping for Improved Ensembled Q-Learning0
An Initial Introduction to Cooperative Multi-Agent Reinforcement Learning0
An Overview of Machine Learning-Enabled Optimization for Reconfigurable Intelligent Surfaces-Aided 6G Networks: From Reinforcement Learning to Large Language Models0
SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory SystemsCode0
Enhancing Q-Learning with Large Language Model Heuristics0
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach0
A Network Simulation of OTC Markets with Multiple Agents0
Regularized Q-learning through Robust AveragingCode0
LOQA: Learning with Opponent Q-Learning Awareness0
Cell Switching in HAPS-Aided Networking: How the Obscurity of Traffic Loads Affects the Decision0
Show:102550
← PrevPage 9 of 39Next →

No leaderboard results yet.