SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 34513475 of 15113 papers

TitleStatusHype
Guided Cooperation in Hierarchical Reinforcement Learning via Model-based RolloutCode0
Guided Deep Reinforcement Learning for Swarm SystemsCode0
Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy OptimizationCode0
Guided Dialogue Policy Learning without Adversarial Learning in the LoopCode0
Guiding Evolutionary Strategies by Differentiable Robot SimulatorsCode0
Growing Action SpacesCode0
Group Equivariant Deep Reinforcement LearningCode0
Adversarial Intrinsic Motivation for Reinforcement LearningCode0
Group-driven Reinforcement Learning for Personalized mHealth InterventionCode0
gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning ApproachCode0
Grounding Language for Transfer in Deep Reinforcement LearningCode0
Deep Neuroevolution of Recurrent and Discrete World ModelsCode0
Guide Actor-Critic for Continuous ControlCode0
Homogenization of Multi-agent Learning Dynamics in Finite-state Markov GamesCode0
GraphNAS: Graph Neural Architecture Search with Reinforcement LearningCode0
Deep Ordinal Reinforcement LearningCode0
Graph Convolutional Reinforcement LearningCode0
DeepPath: A Reinforcement Learning Method for Knowledge Graph ReasoningCode0
A learning gap between neuroscience and reinforcement learningCode0
Operator World Models for Reinforcement LearningCode0
GREEN-CODE: Learning to Optimize Energy Efficiency in LLM-based Code GenerationCode0
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement LearningCode0
Grammars and reinforcement learning for molecule optimizationCode0
ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience ReplayCode0
Graph Backup: Data Efficient Backup Exploiting Markovian TransitionsCode0
Show:102550
← PrevPage 139 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified