SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 151200 of 1918 papers

TitleStatusHype
Regret of exploratory policy improvement and q-learning0
HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search0
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence AnalysisCode0
Zonal RL-RRT: Integrated RL-RRT Path Planning with Collision Probability and Zone ConnectivityCode1
Offline Reinforcement Learning and Sequence Modeling for Downlink Link Adaptation0
Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem0
Q-Distribution guided Q-learning for offline reinforcement learning: Uncertainty penalized Q-value via consistency modelCode0
Optimizing Load Scheduling in Power Grids Using Reinforcement Learning and Markov Decision Processes0
A Novel Reinforcement Learning Model for Post-Incident Malware Investigations0
Streaming Deep Reinforcement Learning Finally WorksCode3
Reward-free World Models for Online Imitation LearningCode1
Multi-Objective-Optimization Multi-AUV Assisted Data Collection Framework for IoUT Based on Offline Reinforcement Learning0
MFC-EQ: Mean-Field Control with Envelope Q-Learning for Moving Decentralized Agents in Formation0
Learning Agents With Prioritization and Parameter Noise in Continuous State and Action Space0
Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search0
DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation0
Diffusion-Based Offline RL for Improved Decision-Making in Augmented ARC Task0
Asymptotic Analysis of Sample-averaged Q-learning0
Online waveform selection for cognitive radar0
Hybrid LLM-DDQN based Joint Optimization of V2I Communication and Autonomous Driving0
UNIQ: Offline Inverse Q-learning for Avoiding Undesirable DemonstrationsCode0
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition0
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers0
Optimized Resource Allocation for Cloud-Native 6G Networks: Zero-Touch ML Models in Microservices-based VNF Deployments0
Q-WSL: Optimizing Goal-Conditioned RL with Weighted Supervised Learning via Dynamic Programming0
Learning in complex action spaces without policy gradients0
Reinforcenment Learning-Aided NOMA Random Access: An AoI-Based Timeliness Perspective0
Mimicking Human Intuition: Cognitive Belief-Driven Q-Learning0
Adaptive Knowledge-based Multi-Objective Evolutionary Algorithm for Hybrid Flow Shop Scheduling Problems with Multiple Parallel Batch Processing Stages0
Reinforcement Learning for Finite Space Mean-Field Type Games0
Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment0
Agent-state based policies in POMDPs: Beyond belief-state MDPs0
A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network OptimizationCode0
Learning to Play Video Games with Intuitive Physics Priors0
Data-Efficient Quadratic Q-Learning Using LMIs0
Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning0
Offline Reinforcement Learning for Learning to Dispatch for Job Shop SchedulingCode0
Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic EnvironmentsCode0
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning0
KAN v.s. MLP for Offline Reinforcement Learning0
Autonomous Vehicle Decision-Making Framework for Considering Malicious Behavior at Unsignalized Intersections0
Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement LearningCode0
Reinforcement Learning for Rate Maximization in IRS-aided OWC Networks0
Reward-Directed Score-Based Diffusion Models via q-Learning0
Faster Q-Learning Algorithms for Restless Bandits0
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes0
On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments0
Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning0
Robust Q-Learning under Corrupted RewardsCode0
Reinforcement Learning-enabled Satellite Constellation Reconfiguration and Retasking for Mission-Critical Applications0
Show:102550
← PrevPage 4 of 39Next →

No leaderboard results yet.