SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1485114900 of 15113 papers

TitleStatusHype
Inverse Reinforcement Learning with Locally Consistent Reward Functions0
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models0
Robotic Search & Rescue via Online Multi-task Reinforcement Learning0
Reinforcement Learning Applied to an Electric Water Heater: From Theory to Practice0
Multiagent Cooperation and Competition with Deep Reinforcement LearningCode1
On the convergence of cycle detection for navigational reinforcement learning0
Strategic Dialogue Management via Deep Reinforcement LearningCode0
MazeBase: A Sandbox for Learning from GamesCode0
Dueling Network Architectures for Deep Reinforcement LearningCode0
Conditional Computation in Neural Networks for faster modelsCode0
Policy DistillationCode0
Actor-Mimic: Deep Multitask and Transfer Reinforcement LearningCode0
Active Object Localization with Deep Reinforcement LearningCode0
Prioritized Experience ReplayCode1
Deep Reinforcement Learning with a Natural Language Action SpaceCode0
Deep Reinforcement Learning in Parameterized Action SpaceCode1
Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control0
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning0
Learning Unfair Trading: a Market Manipulation Analysis From the Reinforcement Learning Perspective0
Generating Text with Deep Reinforcement Learning0
Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning0
On the Computability of AIXI0
Dual Control for Approximate Bayesian Reinforcement Learning0
Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models0
Variational Information Maximisation for Intrinsically Motivated Reinforcement LearningCode0
Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration0
One-Shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors0
Deep Reinforcement Learning with Double Q-learningCode1
Learning Deep Control Policies for Autonomous Aerial Vehicles with MPC-Guided Policy Search0
Deep Spatial Autoencoders for Visuomotor LearningCode0
Benchmarking for Bayesian Reinforcement Learning0
Optimization of anemia treatment in hemodialysis patients via reinforcement learning0
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies0
Recurrent Reinforcement Learning: A Hybrid Approach0
Continuous control with deep reinforcement learningCode1
Reinforcement Learning with Parameterized ActionsCode0
Giraffe: Using Deep Reinforcement Learning to Play ChessCode1
Hyper-parameter Optimisation of Gaussian Process Reinforcement Learning for Statistical Dialogue Management0
Optimising Turn-Taking Strategies With Reinforcement Learning0
Reinforcement Learning of Multi-Issue Negotiation Dialogue Policies0
Reinforcement Learning in Multi-Party Trading Dialog0
A Cognitive Architecture Based on a Learning Classifier System with Spiking Classifiers0
Learning Efficient Representations for Reinforcement Learning0
Multi-agent Reinforcement Learning with Sparse Interactions by Negotiation and Knowledge Transfer0
Distributed Deep Q-Learning0
Action-Conditional Video Prediction using Deep Networks in Atari GamesCode0
A Reinforcement Learning Approach to Online Learning of Decision Trees0
Reinforcement Learning for the Unit Commitment Problem0
Maximum Entropy Deep Inverse Reinforcement LearningCode0
Massively Parallel Methods for Deep Reinforcement LearningCode0
Show:102550
← PrevPage 298 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified