SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 48014825 of 15113 papers

TitleStatusHype
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges0
Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints with Time Windows0
Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning0
PIMbot: Policy and Incentive Manipulation for Multi-Robot Reinforcement Learning in Social DilemmasCode0
Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture SearchCode0
Primitive Skill-based Robot Learning from Human Evaluative Feedback0
TrackAgent: 6D Object Tracking via Reinforcement Learning0
Dialogue Shaping: Empowering Agents through NPC Interaction0
ETHER: Aligning Emergent Communication for Hindsight Experience Replay0
Approximate Model-Based Shielding for Safe Reinforcement LearningCode0
Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation0
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks0
Reinforcement Learning by Guided Safe Exploration0
Mode-constrained Model-based Reinforcement Learning via Gaussian ProcessesCode0
Unbiased Weight Maximization0
Structural Credit Assignment with Coordinated Exploration0
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation0
Reinforcement Learning -based Adaptation and Scheduling Methods for Multi-source DASHCode0
Offline Reinforcement Learning with On-Policy Q-Function Regularization0
Settling the Sample Complexity of Online Reinforcement Learning0
Counterfactual Explanation Policies in RL0
Communication-Efficient Orchestrations for URLLC Service via Hierarchical Reinforcement Learning0
ExWarp: Extrapolation and Warping-based Temporal Supersampling for High-frequency Displays0
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
Show:102550
← PrevPage 193 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified