SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 82768300 of 15113 papers

TitleStatusHype
Resmax: An Alternative Soft-Greedy Operator for Reinforcement Learning0
Plan Your Target and Learn Your Skills: State-Only Imitation Learning via Decoupled Policy Optimization0
Self-Supervised Structured Representations for Deep Reinforcement Learning0
Multi-Agent Reinforcement Learning with Shared Resource in Inventory Management0
Theoretical understanding of adversarial reinforcement learning via mean-field optimal control0
Multi-batch Reinforcement Learning via Sample Transfer and Imitation Learning0
The Remarkable Effectiveness of Combining Policy and Value Networks in A*-based Deep RL for AI Planning0
Offline-Online Reinforcement Learning: Extending Batch and Online RL0
P4O: Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy Optimization0
Rethinking Pareto Approaches in Constrained Reinforcement Learning0
Offline Pre-trained Multi-Agent Decision Transformer0
Should I Run Offline Reinforcement Learning or Behavioral Cloning?0
Selective Token Generation for Few-shot Language Modeling0
Offline Reinforcement Learning for Large Scale Language Action Spaces0
Task-driven Discovery of Perceptual Schemas for Generalization in Reinforcement Learning0
Targeted Environment Design from Offline Data0
Revisiting the Monotonicity Constraint in Cooperative Multi-Agent Reinforcement Learning0
Offline Reinforcement Learning with Resource Constrained Online Deployment0
Towards Understanding Distributional Reinforcement Learning: Regularization, Optimization, Acceleration and Sinkhorn Algorithm0
Towards Unknown-aware Deep Q-Learning0
Model-based Reinforcement Learning with Ensembled Model-value Expansion0
Rewardless Open-Ended Learning (ROEL)0
Transformers are Meta-Reinforcement Learners0
Triangular Dropout: Variable Network Width without Retraining0
MOBA: Multi-teacher Model Based Reinforcement Learning0
Show:102550
← PrevPage 332 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified