SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1465114700 of 15113 papers

TitleStatusHype
Towards deep learning with spiking neurons in energy based models with contrastive Hebbian plasticity0
Stochastic Primal-Dual Methods and Sample Complexity of Reinforcement Learning0
Towards Information-Seeking Agents0
Hierarchy through Composition with Linearly Solvable Markov Decision Processes0
Learning to superoptimize programs - Workshop Version0
Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision0
Bayesian Optimization with Robust Bayesian Neural NetworksCode0
Bootstrapping incremental dialogue systems: using linguistic knowledge to learn from minimal data0
Generalizing Skills with Semi-Supervised Reinforcement Learning0
Adaptive optimal training of animal behavior0
Linear Feature Encoding for Reinforcement Learning0
Playing Doom with SLAM-Augmented Deep Reinforcement LearningCode0
Showing versus doing: Teaching by demonstration0
Exploration for Multi-task Reinforcement Learning with Deep Generative Models0
Learning to Compose Words into Sentences with Reinforcement Learning0
Improving Policy Gradient by Exploring Under-appreciated Rewards0
Nonparametric General Reinforcement Learning0
Training an Interactive Humanoid Robot Using Multimodal Deep Reinforcement LearningCode0
Deep Reinforcement Learning for Multi-Domain Dialogue SystemsCode0
A Simple, Fast Diverse Decoding Algorithm for Neural GenerationCode0
Multiscale Inverse Reinforcement Learning using Diffusion Wavelets0
Recurrent Attention Models for Depth-Based Person Identification0
Variational Intrinsic ControlCode0
Memory Lens: How Much Memory Does an Agent Use?0
Options Discovery with Budgeted Reinforcement Learning0
A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games0
Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPUCode0
Learning to reinforcement learnCode0
Reinforcement Learning with Unsupervised Auxiliary TasksCode0
Reinforcement Learning in Rich-Observation MDPs using Spectral Methods0
Hierarchical Object Detection with Deep Reinforcement LearningCode0
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based ModelsCode0
Learning to Navigate in Complex EnvironmentsCode0
Fairness in Reinforcement Learning0
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control0
RL^2: Fast Reinforcement Learning via Slow Reinforcement LearningCode0
Reinforcement Learning Approach for Parallelization in Filters Aggregation Based Feature Selection Algorithms0
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy CriticCode0
Designing Neural Network Architectures using Reinforcement LearningCode0
Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning0
Learning to Perform Physics Experiments via Deep Reinforcement Learning0
Modular Multitask Reinforcement Learning with Policy SketchesCode0
Neural Architecture Search with Reinforcement LearningCode0
Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality TighteningCode0
Multi-task learning with deep model based reinforcement learning0
Quantile Reinforcement Learning0
Using a Deep Reinforcement Learning Agent for Traffic Signal Control0
Learning Locomotion Skills Using DeepRL: Does the Choice of Action Space Matter?0
Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear0
Learning Runtime Parameters in Computer Systems with Delayed Experience Injection0
Show:102550
← PrevPage 294 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified