SOTAVerified

Q-Learning

The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances.

( Image credit: Playing Atari with Deep Reinforcement Learning )

Papers

Showing 18261850 of 1918 papers

TitleStatusHype
A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling0
Identification and Off-Policy Learning of Multiple Objectives Using Adaptive Clustering0
Learning to Represent Haptic Feedback for Partially-Observable Tasks0
Learning Hard Alignments with Variational Inference0
Discrete Sequential Prediction of Continuous Actions for Deep RL0
Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space -- Fundamental Theory and MethodsCode0
Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning0
Equivalence Between Policy Gradients and Soft Q-Learning0
Reinforcement Learning with External Knowledge and Two-Stage Q-functions for Predicting Popular Reddit Threads0
Deep Q-learning from DemonstrationsCode0
Data-efficient Deep Reinforcement Learning for Dexterous Manipulation0
Pseudorehearsal in value function approximation0
Online Learning for Offloading and Autoscaling in Energy Harvesting Mobile Edge Computing0
Evolution Strategies as a Scalable Alternative to Reinforcement LearningCode1
Multi-step Reinforcement Learning: A Unifying Algorithm0
Bridging the Gap Between Value and Policy Based Reinforcement Learning0
Stabilising Experience Replay for Deep Multi-Agent Reinforcement LearningCode1
Reinforcement Learning with Deep Energy-Based PoliciesCode0
Learning Control for Air Hockey Striking using Deep Reinforcement Learning0
Collaborative Deep Reinforcement Learning for Joint Object Search0
The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI0
FPGA Architecture for Deep Learning and its application to Planetary Robotics0
Learning to predict where to look in interactive environments using deep recurrent q-learning0
Playing Doom with SLAM-Augmented Deep Reinforcement LearningCode0
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy CriticCode0
Show:102550
← PrevPage 74 of 77Next →

No leaderboard results yet.