SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1215112200 of 15113 papers

TitleStatusHype
Towards an Adaptive Robot for Sports and Rehabilitation Coaching0
Modeling Sensorimotor Coordination as Multi-Agent Reinforcement Learning with Differentiable Communication0
Reinforcement Learning for Portfolio ManagementCode0
Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning0
Efficiently Breaking the Curse of Horizon in Off-Policy Evaluation with Double Reinforcement Learning0
Joint Inference of Reward Machines and Policies for Reinforcement Learning0
Correlation Priors for Reinforcement Learning0
Reinforcement Learning Models of Human Behavior: Reward Processing in Mental Disorders0
RecSim: A Configurable Simulation Platform for Recommender SystemsCode0
Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning0
Modelling Working Memory using Deep Recurrent Reinforcement Learning0
Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learningCode0
On Memory Mechanism in Multi-Agent Reinforcement Learning0
Transfer of Temporal Logic Formulas in Reinforcement Learning0
Q-Learning Based Aerial Base Station Placement for Fairness Enhancement in Mobile Networks0
Signal Instructed Coordination in Cooperative Multi-agent Reinforcement Learning0
MAT: Multi-Fingered Adaptive Tactile Grasping via Deep Reinforcement Learning0
Reinforcement Learning and Video Games0
Sampling Strategies for GAN Synthetic Data0
Discovery of Useful Questions as Auxiliary Tasks0
Deep Reinforcement Learning Algorithm for Dynamic Pricing of Express Lanes with Multiple Access LocationsCode0
Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning0
A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World RobotsCode0
Clickbait? Sensational Headline Generation with Auto-tuned Reinforcement LearningCode0
Exploratory Combinatorial Optimization with Reinforcement LearningCode0
AC-Teach: A Bayesian Actor-Critic Method for Policy Learning with an Ensemble of Suboptimal TeachersCode0
DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems0
Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning0
Deterministic Value-Policy Gradients0
Solving Continual Combinatorial Selection via Deep Reinforcement Learning0
Recommendation System-based Upper Confidence Bound for Online Advertising0
Off-Policy Evaluation in Partially Observable Environments0
Partner Approximating Learners (PAL): Simulation-Accelerated Learning with Explicit Partner Modeling in Multi-Agent Domains0
Neural Architecture Search in Embedding Space0
Option Encoder: A Framework for Discovering a Policy Basis in Reinforcement Learning0
Self-driving scale car trained by Deep reinforcement learning0
Personalized HeartSteps: A Reinforcement Learning Algorithm for Optimizing Physical Activity0
Imitation Learning for Human Pose Prediction0
Deep Reinforcement Learning for Control of Probabilistic Boolean NetworksCode0
Automatic Financial Trading Agent for Low-risk Portfolio Management using Deep Reinforcement Learning0
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement LearningCode0
Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning0
Gradient Q(σ, λ): A Unified Algorithm with Function Approximation for Reinforcement Learning0
Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information0
DRLViz: Understanding Decisions and Memory in Deep Reinforcement LearningCode0
Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based ControlCode0
Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation0
Reinforcement Learning for Joint Optimization of Multiple Rewards0
Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs0
Classification with Costly Features as a Sequential Decision-Making ProblemCode0
Show:102550
← PrevPage 244 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified