SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 776800 of 1918 papers

TitleStatusHype
Fitted Q-Learning for Relational Domains0
Learning in Discounted-cost and Average-cost Mean-field Games0
Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning0
Balancing a CartPole System with Reinforcement Learning -- A Tutorial0
ShiQ: Bringing back Bellman to LLMs0
Floyd-Warshall Reinforcement Learning: Learning from Past Experiences to Reach New Goals0
FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game0
Balanced Q-learning: Combining the Influence of Optimistic and Pessimistic Targets0
Deep Surrogate Q-Learning for Autonomous Driving0
FRAC-Q-Learning: A Reinforcement Learning with Boredom Avoidance Processes for Social Robots0
Almost Sure Convergence Rates and Concentration of Stochastic Approximation and Reinforcement Learning with Markovian Noise0
From r to Q^*: Your Language Model is Secretly a Q-Function0
Continuous Deep Q-Learning in Optimal Control Problems: Normalized Advantage Functions Analysis0
Harnessing Deep Q-Learning for Enhanced Statistical Arbitrage in High-Frequency Trading: A Comprehensive Exploration0
Full Gradient Deep Reinforcement Learning for Average-Reward Criterion0
Functional Stability of Discounted Markov Decision Processes Using Economic MPC Dissipativity Theory0
HAVER: Instance-Dependent Error Bounds for Maximum Mean Estimation and Applications to Q-Learning and Monte Carlo Tree Search0
Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy0
Gap-Dependent Bounds for Federated Q-learning0
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition0
Gap-Dependent Bounds for Two-Player Markov Games0
GenCos' Behaviors Modeling Based on Q Learning Improved by Dichotomy0
Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty0
Hidden Incentives for Auto-Induced Distributional Shift0
Deep Spectral Q-learning with Application to Mobile Health0
Show:102550
← PrevPage 32 of 77Next →

No leaderboard results yet.