SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 151200 of 1918 papers

TitleStatusHype
Continuous control with deep reinforcement learningCode1
Deep Recurrent Q-Learning for Partially Observable MDPsCode1
Playing Atari with Deep Reinforcement LearningCode1
Evaluating Reinforcement Learning Algorithms for Navigation in Simulated Robotic Quadrupeds: A Comparative Study Inspired by Guide Dog Behaviour0
Personalized Exercise Recommendation with Semantically-Grounded Knowledge TracingCode0
A Data-Ensemble-Based Approach for Sample-Efficient LQ Control of Linear Time-Varying Systems0
ADDQ: Adaptive Distributional Double Q-LearningCode0
Reinforcement Learning-Based Policy Optimisation For Heterogeneous Radio Access0
ReinDSplit: Reinforced Dynamic Split Learning for Pest Recognition in Precision Agriculture0
Implicit Constraint-Aware Off-Policy Correction for Offline Reinforcement Learning0
"What are my options?": Explaining RL Agents with Diverse Near-Optimal Alternatives (Extended)0
Q-learning-based Hierarchical Cooperative Local Search for Steelmaking-continuous Casting Scheduling Problem0
Regret-Optimal Q-Learning with Low Cost for Single-Agent and Federated Reinforcement Learning0
Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning0
Improving Performance of Spike-based Deep Q-Learning using Ternary Neurons0
Reinforcement Learning for Hanabi0
Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model0
On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment0
Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents0
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL0
A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging0
Inverse Q-Learning Done Right: Offline Imitation Learning in Q^π-Realizable MDPsCode0
Distributionally Robust Deep Q-LearningCode0
Reinforcement Learning for Stock Transactions0
Offline Guarded Safe Reinforcement Learning for Medical Treatment Optimization Strategies0
OPA-Pack: Object-Property-Aware Robotic Bin Packing0
When a Reinforcement Learning Agent Encounters Unknown Unknowns0
Imagination-Limited Q-Learning for Offline Reinforcement Learning0
Automatic Reward Shaping from Confounded Offline Data0
ShiQ: Bringing back Bellman to LLMs0
Bias or Optimality? Disentangling Bayesian Inference and Learning Biases in Human Decision-Making0
Convert Language Model into a Value-based Strategic Planner0
Universal Approximation Theorem for Deep Q-Learning via FBSDE System0
A Large Language Model-Enhanced Q-learning for Capacitated Vehicle Routing Problem with Time Windows0
A critical assessment of reinforcement learning methods for microswimmer navigation in complex flowsCode0
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation0
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making0
Meta-Black-Box-Optimization through Offline Q-function LearningCode0
Universal Approximation Theorem of Deep Q-Networks0
Rank-One Modified Value Iteration0
Dynamic and Distributed Routing in IoT Networks based on Multi-Objective Q-Learning0
Learning Neural Control Barrier Functions from Offline Data with Conservatism0
Q-Learning with Clustered-SMART (cSMART) Data: Examining Moderators in the Construction of Clustered Adaptive Interventions0
Interactive Double Deep Q-network: Integrating Human Interventions and Evaluative Predictions in Reinforcement Learning of Autonomous Driving0
Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes0
SAPO-RL: Sequential Actuator Placement Optimization for Fuselage Assembly via Reinforcement Learning0
Mixed-Precision Conjugate Gradient Solvers with RL-Driven Precision Tuning0
Understanding the theoretical properties of projected Bellman equation, linear Q-learning, and approximate value iteration0
Nash Equilibrium Between Consumer Electronic Devices and DoS Attacker for Distributed IoT-enabled RSE Systems0
A Framework of decision-relevant observability: Reinforcement Learning converges under relative ignorability0
Show:102550
← PrevPage 4 of 39Next →

No leaderboard results yet.