SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 551575 of 15113 papers

TitleStatusHype
Attention Actor-Critic algorithm for Multi-Agent Constrained Co-operative Reinforcement LearningCode1
Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement LearningCode1
Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement LearningCode1
Asynchronous Methods for Deep Reinforcement LearningCode1
A SWAT-based Reinforcement Learning Framework for Crop ManagementCode1
Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative ExplorationCode1
Debiased Contrastive LearningCode1
Data-Efficient Reinforcement Learning with Self-Predictive RepresentationsCode1
Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field ExperimentsCode1
DataLight: Offline Data-Driven Traffic Signal ControlCode1
A Sustainable Ecosystem through Emergent Cooperation in Multi-Agent Reinforcement LearningCode1
Asynchronous Reinforcement Learning for Real-Time Control of Physical RobotsCode1
Dataset Reset Policy Optimization for RLHFCode1
Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value FunctionCode1
Curriculum Reinforcement Learning using Optimal Transport via Gradual Domain AdaptationCode1
Curriculum Offline Imitation LearningCode1
D2RL: Deep Dense Architectures in Reinforcement LearningCode1
Curriculum-based Asymmetric Multi-task Reinforcement LearningCode1
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language ModelsCode1
Curriculum-based Reinforcement Learning for Distribution System Critical Load RestorationCode1
Curious Hierarchical Actor-Critic Reinforcement LearningCode1
CURL: Contrastive Unsupervised Representation Learning for Reinforcement LearningCode1
Aspect Sentiment Triplet Extraction Using Reinforcement LearningCode1
A2C is a special case of PPOCode1
Asset Allocation: From Markowitz to Deep Reinforcement LearningCode1
Show:102550
← PrevPage 23 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified