SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 31013150 of 15113 papers

TitleStatusHype
Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning0
Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints with Time Windows0
Dialogue Shaping: Empowering Agents through NPC Interaction0
TrackAgent: 6D Object Tracking via Reinforcement Learning0
Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture SearchCode0
ETHER: Aligning Emergent Communication for Hindsight Experience Replay0
Primitive Skill-based Robot Learning from Human Evaluative Feedback0
Approximate Model-Based Shielding for Safe Reinforcement LearningCode0
Reinforcement Learning by Guided Safe Exploration0
Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation0
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks0
Mode-constrained Model-based Reinforcement Learning via Gaussian ProcessesCode0
Reinforcement Learning -based Adaptation and Scheduling Methods for Multi-source DASHCode0
Communication-Efficient Orchestrations for URLLC Service via Hierarchical Reinforcement Learning0
Submodular Reinforcement LearningCode1
Offline Reinforcement Learning with On-Policy Q-Function Regularization0
Unbiased Weight Maximization0
Settling the Sample Complexity of Online Reinforcement Learning0
Counterfactual Explanation Policies in RL0
Structural Credit Assignment with Coordinated Exploration0
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation0
ExWarp: Extrapolation and Warping-based Temporal Supersampling for High-frequency Displays0
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal ControlCode1
HIQL: Offline Goal-Conditioned RL with Latent States as ActionsCode1
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations0
DIP-RL: Demonstration-Inferred Preference Learning in Minecraft0
Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value RegularizationCode1
Towards practical reinforcement learning for tokamak magnetic control0
JoinGym: An Efficient Query Optimization Environment for Reinforcement LearningCode1
Bridging the Reality Gap of Reinforcement Learning based Traffic Signal Control using Domain Randomization and Meta Learning0
Reparameterized Policy Learning for Multimodal Trajectory Optimization0
PyTAG: Challenges and Opportunities for Reinforcement Learning in Tabletop GamesCode1
A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading0
Explaining Autonomous Driving Actions with Visual Question AnsweringCode1
Benchmarking Potential Based Rewards for Learning Humanoid LocomotionCode2
Towards A Unified Agent with Foundation Models0
Data Cross-Segmentation for Improved Generalization in Reinforcement Learning Based Algorithmic Trading0
Distributed 3D-Beam Reforming for Hovering-Tolerant UAVs Communication over Coexistence: A Deep-Q Learning for Intelligent Space-Air-Ground Integrated Networks0
IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit based on Analyses of InterestingnessCode0
REX: Rapid Exploration and eXploitation for AI Agents0
Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees0
Quarl: A Learning-Based Quantum Circuit Optimizer0
Natural Actor-Critic for Robust Reinforcement Learning with Function ApproximationCode1
An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient0
Basal-Bolus Advisor for Type 1 Diabetes (T1D) Patients Using Multi-Agent Reinforcement Learning (RL) Methodology0
Discovering User Types: Mapping User Traits by Task-Specific Behaviors in Reinforcement Learning0
Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning0
POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenanceCode0
Show:102550
← PrevPage 63 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified