SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 95269550 of 15113 papers

TitleStatusHype
Distributional Reinforcement Learning for mmWave Communications with Intelligent Reflectors on a UAV0
Online Observer-Based Inverse Reinforcement Learning0
Generalization to New Actions in Reinforcement LearningCode1
Self-Driving Network and Service Coordination Using Deep Reinforcement LearningCode1
Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models0
Sample-efficient reinforcement learning using deep Gaussian processes0
Exact Asymptotics for Linear Quadratic Adaptive ControlCode0
Incorporating Rivalry in Reinforcement Learning for a Competitive Game0
Depth Self-Optimized Learning Toward Data ScienceCode0
Information-theoretic Task Selection for Meta-Reinforcement Learning0
Fast Reinforcement Learning with Incremental Gaussian Mixture Models0
Cooperative Heterogeneous Deep Reinforcement Learning0
Instance based Generalization in Reinforcement LearningCode0
Causal Campbell-Goodhart's law and Reinforcement LearningCode0
Interpreting Graph Drawing with Multi-Agent Reinforcement Learning0
A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting0
NEARL: Non-Explicit Action Reinforcement Learning for Robotic Control0
Reinforcement Learning of Structured Control for Linear Systems with Unknown State Matrix0
Multi-Agent Reinforcement Learning for Visibility-based Persistent MonitoringCode0
Reinforcement Learning with Efficient Active Feature Acquisition0
Production-based Cognitive Models as a Test Suite for Reinforcement Learning Algorithms0
Reinforcement Learning with Imbalanced Dataset for Data-to-Text Medical Report Generation0
Guided Dialogue Policy Learning without Adversarial Learning in the LoopCode0
Few-Shot Multi-Hop Relation Reasoning over Knowledge Bases0
Task-Completion Dialogue Policy Learning via Monte Carlo Tree Search with Dueling Network0
Show:102550
← PrevPage 382 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified