SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 326350 of 1918 papers

TitleStatusHype
Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment0
Agent-state based policies in POMDPs: Beyond belief-state MDPs0
A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network OptimizationCode0
Learning to Play Video Games with Intuitive Physics Priors0
Data-Efficient Quadratic Q-Learning Using LMIs0
Automating proton PBS treatment planning for head and neck cancers using policy gradient-based deep reinforcement learning0
Audio-Driven Reinforcement Learning for Head-Orientation in Naturalistic EnvironmentsCode0
Offline Reinforcement Learning for Learning to Dispatch for Job Shop SchedulingCode0
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning0
KAN v.s. MLP for Offline Reinforcement Learning0
Autonomous Vehicle Decision-Making Framework for Considering Malicious Behavior at Unsignalized Intersections0
Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement LearningCode0
Reinforcement Learning for Rate Maximization in IRS-aided OWC Networks0
Reward-Directed Score-Based Diffusion Models via q-Learning0
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes0
Faster Q-Learning Algorithms for Restless Bandits0
Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning0
On the Convergence Rates of Federated Q-Learning across Heterogeneous Environments0
Robust Q-Learning under Corrupted RewardsCode0
Reinforcement Learning-enabled Satellite Constellation Reconfiguration and Retasking for Mission-Critical Applications0
Accelerated Multi-objective Task Learning using Modified Q-learning Algorithm0
Imitating Language via Scalable Inverse Reinforcement Learning0
The Sample-Communication Complexity Trade-off in Federated Q-Learning0
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes0
Coverage Analysis of Multi-Environment Q-Learning Algorithms for Wireless Network Optimization0
Show:102550
← PrevPage 14 of 77Next →

No leaderboard results yet.