SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 48014850 of 15113 papers

TitleStatusHype
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges0
Reinforcement Learning Under Probabilistic Spatio-Temporal Constraints with Time Windows0
Using Implicit Behavior Cloning and Dynamic Movement Primitive to Facilitate Reinforcement Learning for Robot Motion Planning0
PIMbot: Policy and Incentive Manipulation for Multi-Robot Reinforcement Learning in Social DilemmasCode0
Shrink-Perturb Improves Architecture Mixing during Population Based Training for Neural Architecture SearchCode0
Primitive Skill-based Robot Learning from Human Evaluative Feedback0
TrackAgent: 6D Object Tracking via Reinforcement Learning0
Dialogue Shaping: Empowering Agents through NPC Interaction0
ETHER: Aligning Emergent Communication for Hindsight Experience Replay0
Approximate Model-Based Shielding for Safe Reinforcement LearningCode0
Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation0
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks0
Reinforcement Learning by Guided Safe Exploration0
Mode-constrained Model-based Reinforcement Learning via Gaussian ProcessesCode0
Unbiased Weight Maximization0
Structural Credit Assignment with Coordinated Exploration0
The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation0
Reinforcement Learning -based Adaptation and Scheduling Methods for Multi-source DASHCode0
Offline Reinforcement Learning with On-Policy Q-Function Regularization0
Settling the Sample Complexity of Online Reinforcement Learning0
Counterfactual Explanation Policies in RL0
Communication-Efficient Orchestrations for URLLC Service via Hierarchical Reinforcement Learning0
ExWarp: Extrapolation and Warping-based Temporal Supersampling for High-frequency Displays0
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning0
On the Effectiveness of Offline RL for Dialogue Response GenerationCode0
DIP-RL: Demonstration-Inferred Preference Learning in Minecraft0
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations0
Bridging the Reality Gap of Reinforcement Learning based Traffic Signal Control using Domain Randomization and Meta Learning0
Towards practical reinforcement learning for tokamak magnetic control0
Reparameterized Policy Learning for Multimodal Trajectory Optimization0
A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading0
Distributed 3D-Beam Reforming for Hovering-Tolerant UAVs Communication over Coexistence: A Deep-Q Learning for Intelligent Space-Air-Ground Integrated Networks0
Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees0
Data Cross-Segmentation for Improved Generalization in Reinforcement Learning Based Algorithmic Trading0
IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit based on Analyses of InterestingnessCode0
Towards A Unified Agent with Foundation Models0
REX: Rapid Exploration and eXploitation for AI Agents0
Quarl: A Learning-Based Quantum Circuit Optimizer0
Basal-Bolus Advisor for Type 1 Diabetes (T1D) Patients Using Multi-Agent Reinforcement Learning (RL) Methodology0
An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient0
Discovering User Types: Mapping User Traits by Task-Specific Behaviors in Reinforcement Learning0
POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenanceCode0
Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning0
Seeing is not Believing: Robust Reinforcement Learning against Spurious Correlation0
Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty0
An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets0
Combining model-predictive control and predictive reinforcement learning for stable quadrupedal robot locomotion0
Why Guided Dialog Policy Learning performs well? Understanding the role of adversarial learning and its alternative0
Transformers in Reinforcement Learning: A Survey0
Learning Decentralized Partially Observable Mean Field Control for Artificial Collective Behavior0
Show:102550
← PrevPage 97 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified