SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 31013125 of 15113 papers

TitleStatusHype
PIMbot: Policy and Incentive Manipulation for Multi-Robot Reinforcement Learning in Social DilemmasCode0
Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints with Time Windows0
Dialogue Shaping: Empowering Agents through NPC Interaction0
Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture SearchCode0
ETHER: Aligning Emergent Communication for Hindsight Experience Replay0
TrackAgent: 6D Object Tracking via Reinforcement Learning0
Primitive Skill-based Robot Learning from Human Evaluative Feedback0
Approximate Model-Based Shielding for Safe Reinforcement LearningCode0
Reinforcement Learning by Guided Safe Exploration0
Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation0
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks0
Mode-constrained Model-based Reinforcement Learning via Gaussian ProcessesCode0
Reinforcement Learning -based Adaptation and Scheduling Methods for Multi-source DASHCode0
Communication-Efficient Orchestrations for URLLC Service via Hierarchical Reinforcement Learning0
Offline Reinforcement Learning with On-Policy Q-Function Regularization0
Submodular Reinforcement LearningCode1
Settling the Sample Complexity of Online Reinforcement Learning0
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation0
Counterfactual Explanation Policies in RL0
Unbiased Weight Maximization0
Structural Credit Assignment with Coordinated Exploration0
ExWarp: Extrapolation and Warping-based Temporal Supersampling for High-frequency Displays0
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal ControlCode1
Show:102550
← PrevPage 125 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified