SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 78017825 of 15113 papers

TitleStatusHype
Safe Exploration by Solving Early Terminated MDP0
Policy Gradient Methods for Distortion Risk Measures0
BayesSimIG: Scalable Parameter Inference for Adaptive Domain Randomization with IsaacGymCode1
Inferring Probabilistic Reward Machines from Non-Markovian Reward Processes for Reinforcement Learning0
Aligning an optical interferometer with beam divergence control and continuous action spaceCode0
Learning Interaction-aware Guidance Policies for Motion Planning in Dense Traffic Scenarios0
Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention0
Offline reinforcement learning with uncertainty for treatment strategies in sepsis0
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning0
Computational Benefits of Intermediate Rewards for Goal-Reaching Policy LearningCode0
CLAIM: Curriculum Learning Policy for Influence Maximization in Unknown Social Networks0
Automated Gain Control Through Deep Reinforcement Learning for Downstream Radar Object Detection0
Adaptive Stress Testing for Adversarial Learning in a Financial Environment0
Adaptation of Quadruped Robot Locomotion with Meta-Learning0
Enhancing Video Analytics Accuracy via Real-time Automated Camera Parameter Tuning0
Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal TransformersCode1
Sublinear Regret for Learning POMDPs0
Towards Autonomous Pipeline Inspection with Hierarchical Reinforcement Learning0
Offline Meta-Reinforcement Learning with Online Self-SupervisionCode1
Federated Model Search via Reinforcement Learning0
Learning Time-Invariant Reward Functions through Model-Based Inverse Reinforcement Learning0
DORA: Toward Policy Optimization for Task-oriented Dialogue System with Efficient ContextCode0
Quadruped Locomotion on Non-Rigid Terrain using Reinforcement Learning0
Pseudo-Model-Free Hedging for Variable Annuities via Deep Reinforcement Learning0
Distributed Online Service Coordination Using Deep Reinforcement LearningCode1
Show:102550
← PrevPage 313 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified