SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1370113750 of 15113 papers

TitleStatusHype
Where Do You Think You're Going?: Inferring Beliefs about Dynamics from BehaviorCode0
Self-Correcting Models for Model-Based Reinforcement LearningCode0
MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman OperatorCode0
Near-optimal Deep Reinforcement Learning Policies from Data for Zone Temperature ControlCode0
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference LearningCode0
Opponent Modeling in Deep Reinforcement LearningCode0
Pseudo-Rehearsal: Achieving Deep Reinforcement Learning without Catastrophic ForgettingCode0
Opponent Aware Reinforcement LearningCode0
Towards Finding Longer ProofsCode0
MICo: Improved representations via sampling-based state similarity for Markov decision processesCode0
Optimality Inductive Biases and Agnostic Guidelines for Offline Reinforcement LearningCode0
Self-Guided Evolution Strategies with Historical Estimated GradientsCode0
OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution MatchingCode0
Zeroth-Order Actor-Critic: An Evolutionary Framework for Sequential Decision ProblemsCode0
Systematic Rectification of Language Models via Dead-end AnalysisCode0
Self-Imitation Learning for Robot Tasks with Sparse and Delayed RewardsCode0
Near Optimal Behavior via Approximate State AbstractionCode0
Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in Partially Observed Markov Decision ProcessesCode0
MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active LearningCode0
Operator World Models for Reinforcement LearningCode0
Self-Learning Exploration and Mapping for Mobile Robots via Deep Reinforcement LearningCode0
Memory-based Deep Reinforcement Learning for Obstacle Avoidance in UAV with Limited Environment KnowledgeCode0
Tackling Asymmetric and Circular Sequential Social Dilemmas with Reinforcement Learning and Graph-based Tit-for-TatCode0
Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency ParsingCode0
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural NetworksCode0
VacSIM: Learning Effective Strategies for COVID-19 Vaccine Distribution using Reinforcement LearningCode0
Self-Paced Context Evaluation for Contextual Reinforcement LearningCode0
Proximal Distilled Evolutionary Reinforcement LearningCode0
Proximal Curriculum with Task Correlations for Deep Reinforcement LearningCode0
Learning Progress Driven Multi-Agent CurriculumCode0
Uncertainty-Aware Reward-Free Exploration with General Function ApproximationCode0
Memory Augmented Self-PlayCode0
Proximal Curriculum for Reinforcement Learning AgentsCode0
Model-Free Adaptive Optimal Control of Episodic Fixed-Horizon Manufacturing Processes using Reinforcement LearningCode0
Self Punishment and Reward Backfill for Deep Q-LearningCode0
Learning to Stabilize Online Reinforcement Learning in Unbounded State SpacesCode0
Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot NavigationCode0
Provably Efficient Reinforcement Learning with Linear Function ApproximationCode0
MOOSS: Mask-Enhanced Temporal Contrastive Learning for Smooth State Evolution in Visual Reinforcement LearningCode0
Navigating Demand Uncertainty in Container Shipping: Deep Reinforcement Learning for Enabling Adaptive and Feasible Master Stowage PlanningCode0
VIPeR: Provably Efficient Algorithm for Offline RL with Neural Function ApproximationCode0
Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data CorruptionsCode0
On the Unreasonable Efficiency of State Space Clustering in Personalization TasksCode0
Towards Hyperparameter-free Policy Selection for Offline Reinforcement LearningCode0
On the Reuse Bias in Off-Policy Reinforcement LearningCode0
Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic ControlCode0
Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANsCode0
Self-Supervised State-Control through Intrinsic Mutual Information RewardsCode0
Welfare and Fairness in Multi-objective Reinforcement LearningCode0
Provably Efficient Exploration for Reinforcement Learning Using Unsupervised LearningCode0
Show:102550
← PrevPage 275 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified