SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 20012050 of 15113 papers

TitleStatusHype
CompoSuite: A Compositional Reinforcement Learning BenchmarkCode1
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement LearningCode1
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer LearningCode1
State-wise Constrained Policy OptimizationCode1
Staying up to Date with Online Content Changes Using Reinforcement Learning for SchedulingCode1
Compositional Reinforcement Learning from Logical SpecificationsCode1
Compound AI Systems Optimization: A Survey of Methods, Challenges, and Future DirectionsCode1
Compiler Optimization for Quantum Computing Using Reinforcement LearningCode1
Stranger Danger! Identifying and Avoiding Unpredictable Pedestrians in RL-based Social Robot NavigationCode1
Strategically Conservative Q-LearningCode1
An Optimistic Perspective on Offline Reinforcement LearningCode1
Learning to Paint With Model-based Deep Reinforcement LearningCode1
Style-Agnostic Reinforcement LearningCode1
Subequivariant Graph Reinforcement Learning in 3D EnvironmentsCode1
Competitiveness of MAP-Elites against Proximal Policy Optimization on locomotion tasks in deterministic simulationsCode1
Analysis of diversity-accuracy tradeoff in image captioningCode1
A Boolean Task Algebra for Reinforcement LearningCode1
Supported Policy Optimization for Offline Reinforcement LearningCode1
Compile Scene Graphs with Reinforcement LearningCode1
SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot LearningCode1
Computational Performance of Deep Reinforcement Learning to find Nash EquilibriaCode1
Swapped goal-conditioned offline reinforcement learningCode1
BEAR: Physics-Principled Building Environment for Control and Reinforcement LearningCode1
A Composable Specification Language for Reinforcement Learning TasksCode1
Symbolic Relational Deep Reinforcement Learning based on Graph Neural Networks and Autoregressive Policy DecompositionCode1
Symbolic Visual Reinforcement Learning: A Scalable Framework with Object-Level Abstraction and Differentiable Expression SearchCode1
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksCode1
An Attentive Graph Agent for Topology-Adaptive Cyber DefenceCode1
Comparing Observation and Action Representations for Deep Reinforcement Learning in μRTSCode1
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement LearningCode1
A Comprehensive Survey of Data Augmentation in Visual Reinforcement LearningCode1
CDT: Cascading Decision Trees for Explainable Reinforcement LearningCode1
Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model CheckingCode1
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming ChallengesCode1
Adaptive Transformers in RLCode1
Behavior From the Void: Unsupervised Active Pre-TrainingCode1
CommonPower: A Framework for Safe Data-Driven Smart Grid ControlCode1
Combining Semantic Guidance and Deep Reinforcement Learning For Generating Human Level PaintingsCode1
Teaching Agents how to Map: Spatial Reasoning for Multi-Object NavigationCode1
Teal: Learning-Accelerated Optimization of WAN Traffic EngineeringCode1
Behavior Proximal Policy OptimizationCode1
TEMPERA: Test-Time Prompting via Reinforcement LearningCode1
Communicative Reinforcement Learning Agents for Landmark Detection in Brain ImagesCode1
Comparing Popular Simulation Environments in the Scope of Robotics and Reinforcement LearningCode1
Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and BaselinesCode1
Text Generation by Learning from DemonstrationsCode1
Concise Reasoning via Reinforcement LearningCode1
An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter OptimizationCode1
Combining Reinforcement Learning and Constraint Programming for Combinatorial OptimizationCode1
Combining Reinforcement Learning with Lin-Kernighan-Helsgaun Algorithm for the Traveling Salesman ProblemCode1
Show:102550
← PrevPage 41 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified