SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1235112400 of 15113 papers

TitleStatusHype
An Actor-Critic-Attention Mechanism for Deep Reinforcement Learning in Multi-view Environments0
Delegative Reinforcement Learning: learning to avoid traps with a little help0
Combinatorial Keyword Recommendations for Sponsored Search with Deep Reinforcement Learning0
Convolutional Reservoir Computing for World ModelsCode0
Self-Attentional Credit Assignment for Transfer in Reinforcement LearningCode0
Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery0
Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration0
Photonic architecture for reinforcement learning0
Zermelo's problem: Optimal point-to-point navigation in 2D turbulent flows using Reinforcement Learning0
CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers0
An Inductive Synthesis Framework for Verifiable Reinforcement Learning0
Improved Reinforcement Learning through Imitation Learning Pretraining Towards Image-based Autonomous Driving0
Model-free Control of Chaos with Continuous Deep Q-learning0
Ranking sentences from product description & bullets for better search0
Mutual Reinforcement Learning0
PPO Dash: Improving Generalization in Deep Reinforcement LearningCode0
Federated Reinforcement Distillation with Proxy Experience Memory0
A Dual Memory Structure for Efficient Use of Replay Memory in Deep Reinforcement Learning0
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement LearningCode0
Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative SamplingCode0
Environment Reconstruction with Hidden Confounders for Reinforcement Learning based Recommendation0
A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning0
Imitation-Projected Programmatic Reinforcement Learning0
DisCoRL: Continual Reinforcement Learning via Policy Distillation0
Provably Efficient Reinforcement Learning with Linear Function ApproximationCode0
Reinforcement Learning with Chromatic Networks for Compact Architecture Search0
Regularizing Neural Networks for Future Trajectory Prediction via Inverse Reinforcement Learning FrameworkCode0
DOB-Net: Actively Rejecting Unknown Excessive Time-Varying Disturbances0
Interpretable Dynamics Models for Data-Efficient Reinforcement Learning0
Deep Reinforcement-Learning-based Driving Policy for Autonomous Road Vehicles0
Assessing Transferability from Simulation to Reality for Reinforcement Learning0
Capturing Financial markets to apply Deep Reinforcement Learning0
Dreaming machine learning: Lipschitz extensions for reinforcement learning on financial markets0
Better-than-Demonstrator Imitation Learning via Automatically-Ranked DemonstrationsCode0
Variance-Based Risk Estimations in Markov Processes via Transformation with State Lumping0
On-Policy Robot Imitation Learning from a Converging Supervisor0
Policy-Gradient Algorithms Have No Guarantees of Convergence in Linear Quadratic Games0
ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning0
Variational Inference MPC for Bayesian Model-based Reinforcement Learning0
Data Efficient Reinforcement Learning for Legged Robots0
Deep Active Inference as Variational Policy GradientsCode0
A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning0
Intrinsic Motivation Driven Intuitive Physics Learning using Deep Reinforcement Learning with Intrinsic Reward Normalization0
Playing Flappy Bird via Asynchronous Advantage Actor Critic Algorithm0
On Inductive Biases in Deep Reinforcement Learning0
Self-supervised Learning of Distance Functions for Goal-Conditioned Reinforcement Learning0
Deep Reinforcement Learning For Modeling Chit-Chat Dialog With Discrete Attributes0
Learning a Behavioral Repertoire from Demonstrations0
Incrementally Learning Functions of the Return0
Attentive Multi-Task Deep Reinforcement LearningCode0
Show:102550
← PrevPage 248 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified