SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1370113750 of 15113 papers

TitleStatusHype
A unified strategy for implementing curiosity and empowerment driven reinforcement learning0
Learning from Outside the Viability Kernel: Why we Should Build Robots that can Fall with Grace0
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress0
Learning Policy Representations in Multiagent Systems0
Handling Cold-Start Collaborative Filtering with Reinforcement Learning0
BaRC: Backward Reachability Curriculum for Robotic Reinforcement LearningCode0
Scheduled Policy Optimization for Natural Language Communication with Intelligent AgentsCode0
Surprising Negative Results for Generative Adversarial Tree SearchCode0
Multi-Level Policy and Reward Reinforcement Learning for Image Captioning0
Improving width-based planning with compact policies0
An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method0
Automated Image Data Preprocessing with Deep Reinforcement LearningCode0
Implicit Quantile Networks for Distributional Reinforcement LearningCode0
Deep Reinforcement Learning for Dynamic Urban Transportation Problems0
Adaptive Shooting for Bots in First Person Shooter Games Using Reinforcement Learning0
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network0
Structured Variational Learning of Bayesian Neural Networks with Horseshoe PriorsCode0
Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with ApplicationsCode0
Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control0
Learning to Shoot in First Person Shooter Games by Stabilizing Actions and Clustering Rewards for Reinforcement Learning0
Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning0
Multi-Agent Deep Reinforcement Learning with Human Strategies0
Unsupervised Meta-Learning for Reinforcement Learning0
The Potential of the Return Distribution for Exploration in RLCode0
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning0
Deep Reinforcement Learning for Chinese Zero pronoun ResolutionCode0
Deep Curiosity Loops in Social Environments0
Implicit Policy for Reinforcement Learning0
Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces0
Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents0
Temporal Difference Variational Auto-EncoderCode0
Program Synthesis Through Reinforcement Learning Guided Tree Search0
Randomized Prior Functions for Deep Reinforcement LearningCode0
Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings0
Deep Variational Reinforcement Learning for POMDPsCode0
Deep Reinforcement Learning for General Video Game AICode0
Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning0
Relational Deep Reinforcement LearningCode0
Mix&Match - Agent Curricula for Reinforcement Learning0
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling0
TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement LearningCode0
Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise0
Playing Atari with Six NeuronsCode0
BindsNET: A machine learning-oriented spiking neural networks library in PythonCode0
Challenges in High-dimensional Reinforcement Learning with Evolution StrategiesCode0
Adversarial Reinforcement Learning Framework for Benchmarking Collision Avoidance Mechanisms in Autonomous Vehicles0
Exploration in Structured Reinforcement Learning0
Multi-Agent Reinforcement Learning via Double Averaging Primal-Dual Optimization0
Internal Model from Observations for Reward Shaping0
DAQN: Deep Auto-encoder and Q-Network0
Show:102550
← PrevPage 275 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified