SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1470114750 of 15113 papers

TitleStatusHype
Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates0
Deep Visual Foresight for Planning Robot MotionCode0
Deep Reinforcement Learning for Tensegrity Robot Locomotion0
Deep Reinforcement Learning for Mention-Ranking Coreference ModelsCode0
UbuntuWorld 1.0 LTS - A Platform for Automated Problem Solving & Troubleshooting in the Ubuntu OS0
Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game0
Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer0
Input Convex Neural NetworksCode0
Modelling Stock-market Investors as Reinforcement Learning Agents [Correction]0
Opponent Modeling in Deep Reinforcement LearningCode0
Towards Deep Symbolic Reinforcement Learning0
SeqGAN: Sequence Generative Adversarial Nets with Policy GradientCode0
Playing FPS Games with Deep Reinforcement LearningCode0
Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement LearningCode0
Interactive Spoken Content Retrieval by Deep Reinforcement Learning0
Exploration Potential0
The Option-Critic ArchitectureCode0
Stochastic evolution in populations of ideas0
Bayesian Reinforcement Learning: A Survey0
A Threshold-based Scheme for Reinforcement Learning in Neural NetworksCode0
A centralized reinforcement learning method for multi-agent job scheduling in Grid0
Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks0
Dialogue manager domain adaptation using Gaussian process reinforcement learning0
Unifying task specification in reinforcement learning0
Towards End-to-End Reinforcement Learning of Dialogue Agents for Information AccessCode0
Reward Function and Initial Values: Better Choices for Accelerated Goal-Directed Reinforcement Learning0
Single photon in hierarchical architecture for physical reinforcement learning: Photon intelligence0
Adaptive Probabilistic Trajectory Optimization via Efficient Approximate Inference0
Modeling Human Reading with Neural Attention0
Reinforcement Learning algorithms for regret minimization in structured Markov Decision Processes0
Open Problem: Approximate Planning of POMDPs in the class of Memoryless Policies0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Perceptual Reward Functions0
Posterior Sampling for Reinforcement Learning Without EpisodesCode0
On Lower Bounds for Regret in Reinforcement Learning0
Neuroevolution-Based Inverse Reinforcement Learning0
Online Adaptation of Deep Architectures with Reinforcement Learning0
Discovering Latent States for Model Learning: Applying Sensorimotor Contingencies Theory and Predictive Processing to Model Context0
Self-organization in a distributed coordination game through heuristic rules0
A Sensorimotor Reinforcement Learning Framework for Physical Human-Robot Interaction0
Accelerating Stochastic Composition Optimization0
An Actor-Critic Algorithm for Sequence PredictionCode0
Playing Atari Games with Deep Reinforcement Learning and Human Checkpoint ReplayCode0
Sequential Cost-Sensitive Feature Acquisition0
Automatic Bridge Bidding Using Deep Reinforcement Learning0
A Greedy Approach to Adapting the Trace Parameter for Temporal Difference LearningCode0
Why is Posterior Sampling Better than Optimism for Reinforcement Learning?0
Is the Bellman residual a bad proxy?0
Unsupervised preprocessing for Tactile Data0
Simultaneous Control and Human Feedback in the Training of a Robotic Agent with Actor-Critic Reinforcement Learning0
Show:102550
← PrevPage 295 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified