SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1360113650 of 15113 papers

TitleStatusHype
A Dissection of Overfitting and Generalization in Continuous Reinforcement Learning0
Skilled Experience Catalogue: A Skill-Balancing Mechanism for Non-Player Characters using Reinforcement Learning0
Sim-to-Real Reinforcement Learning for Deformable Object ManipulationCode0
Reinforcement Learning using Augmented Neural Networks0
RUDDER: Return Decomposition for Delayed RewardsCode0
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress0
A unified strategy for implementing curiosity and empowerment driven reinforcement learning0
Learning from Outside the Viability Kernel: Why we Should Build Robots that can Fall with Grace0
Learning Policy Representations in Multiagent Systems0
Handling Cold-Start Collaborative Filtering with Reinforcement Learning0
BaRC: Backward Reachability Curriculum for Robotic Reinforcement LearningCode0
Scheduled Policy Optimization for Natural Language Communication with Intelligent AgentsCode0
Multi-Level Policy and Reward Reinforcement Learning for Image Captioning0
Surprising Negative Results for Generative Adversarial Tree SearchCode0
Improving width-based planning with compact policies0
An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method0
Automated Image Data Preprocessing with Deep Reinforcement LearningCode0
Implicit Quantile Networks for Distributional Reinforcement LearningCode0
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network0
Maximum a Posteriori Policy OptimisationCode1
Adaptive Shooting for Bots in First Person Shooter Games Using Reinforcement Learning0
Deep Reinforcement Learning for Dynamic Urban Transportation Problems0
Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control0
Marginal Policy Gradients: A Unified Family of Estimators for Bounded Action Spaces with ApplicationsCode0
Structured Variational Learning of Bayesian Neural Networks with Horseshoe PriorsCode0
Learning to Shoot in First Person Shooter Games by Stabilizing Actions and Clustering Rewards for Reinforcement Learning0
Meta-Learning Transferable Active Learning Policies by Deep Reinforcement Learning0
Unsupervised Meta-Learning for Reinforcement Learning0
Multi-Agent Deep Reinforcement Learning with Human Strategies0
The Potential of the Return Distribution for Exploration in RLCode0
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning0
Implicit Policy for Reinforcement Learning0
Deep Curiosity Loops in Social Environments0
Deep Reinforcement Learning for Chinese Zero pronoun ResolutionCode0
Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces0
Program Synthesis Through Reinforcement Learning Guided Tree Search0
Temporal Difference Variational Auto-EncoderCode0
Randomized Prior Functions for Deep Reinforcement LearningCode0
Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents0
Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings0
Deep Variational Reinforcement Learning for POMDPsCode0
Deep Reinforcement Learning for General Video Game AICode0
Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning0
Relational Deep Reinforcement LearningCode0
Mix&Match - Agent Curricula for Reinforcement Learning0
Playing Atari with Six NeuronsCode0
Mitigation of Policy Manipulation Attacks on Deep Q-Networks with Parameter-Space Noise0
TD or not TD: Analyzing the Role of Temporal Differencing in Deep Reinforcement LearningCode0
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling0
Adversarial Reinforcement Learning Framework for Benchmarking Collision Avoidance Mechanisms in Autonomous Vehicles0
Show:102550
← PrevPage 273 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified