SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 11761200 of 1918 papers

TitleStatusHype
Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation0
Solving The Lunar Lander Problem under Uncertainty using Reinforcement LearningCode0
Learning Principle of Least Action with Reinforcement LearningCode0
Multi-Agent Reinforcement Learning for Markov Routing Games: A New Modeling Paradigm For Dynamic Traffic Assignment0
Provable Multi-Objective Reinforcement Learning with Generative Models0
Adaptive Contention Window Design using Deep Q-learningCode1
C-Learning: Learning to Achieve Goals via Recursive Classification0
Constrained Model-Free Reinforcement Learning for Process Optimization0
A deep Q-Learning based Path Planning and Navigation System for Firefighting Environments0
On Using Hamiltonian Monte Carlo Sampling for Reinforcement Learning Problems in High-dimension0
Reinforced Deep Markov Models With Applications in Automatic Trading0
Multi-Agent Reinforcement Learning for Channel Assignment and Power Allocation in Platoon-Based C-V2X Systems0
Reinforcement Learning for Assignment problem0
A Hysteretic Q-learning Coordination Framework for Emerging Mobility Systems in Smart Cities0
Control with adaptive Q-learningCode0
Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment SettingsCode0
DeepFoldit -- A Deep Reinforcement Learning Neural Network Folding Proteins0
Finite-Time Convergence Rates of Decentralized Stochastic Approximation with Applications in Multi-Agent and Multi-Task Learning0
Learning Time Reduction Using Warm Start Methods for a Reinforcement Learning Based Supervisory Control in Hybrid Electric Vehicle Applications0
Energy Consumption and Battery Aging Minimization Using a Q-learning Strategy for a Battery/Ultracapacitor Electric Vehicle0
Hamilton-Jacobi Deep Q-Learning for Deterministic Continuous-Time Systems with Lipschitz Continuous ControlsCode1
Energy and Service-priority aware Trajectory Design for UAV-BSs using Double Q-Learning0
Enhancing reinforcement learning by a finite reward response filter with a case study in intelligent structural control0
An Adiabatic Theorem for Policy Tracking with TD-learning0
Learning Guidance Rewards with Trajectory-space SmoothingCode1
Show:102550
← PrevPage 48 of 77Next →

No leaderboard results yet.