SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 35513575 of 15113 papers

TitleStatusHype
Gotta Learn Fast: A New Benchmark for Generalization in RLCode0
Goal-conditioned Imitation LearningCode0
Automatic Discovery of Interpretable Planning StrategiesCode0
Playing FPS Games with Deep Reinforcement LearningCode0
Combining Automated Optimisation of Hyperparameters and Reward ShapeCode0
Goal-Conditioned Q-Learning as Knowledge DistillationCode0
Combined Reinforcement Learning via Abstract RepresentationsCode0
GLIB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal BabblingCode0
"Give Me an Example Like This": Episodic Active Reinforcement Learning from DemonstrationsCode0
Policy Consolidation for Continual Reinforcement LearningCode0
Global and Local Analysis of Interestingness for Competency-Aware Deep Reinforcement LearningCode0
Policy DistillationCode0
GFlowNet Training by Policy GradientsCode0
GHQ: Grouped Hybrid Q Learning for Heterogeneous Cooperative Multi-agent Reinforcement LearningCode0
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM AlignmentCode0
GFlowNets and variational inferenceCode0
Gifting in multi-agent reinforcement learningCode0
Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision ProcessesCode0
Genes in Intelligent AgentsCode0
APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement LearningCode0
Learning robust control for LQR systems with multiplicative noise via policy gradientCode0
Generic Itemset Mining Based on Reinforcement LearningCode0
Automatic Goal Generation for Reinforcement Learning AgentsCode0
Collision Avoidance Robotics Via Meta-Learning (CARML)Code0
Collision Avoidance in Pedestrian-Rich Environments with Deep Reinforcement LearningCode0
Show:102550
← PrevPage 143 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified