SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1465114700 of 15113 papers

TitleStatusHype
Learning State Abstractions for Transfer in Continuous ControlCode0
Knowledge Transfer in Deep Reinforcement Learning via an RL-Specific GAN-Based Correspondence FunctionCode0
An Empirical Study of Deep Reinforcement Learning in Continuing TasksCode0
Improving Automatic Source Code Summarization via Deep Reinforcement LearningCode0
A Simple, Fast Diverse Decoding Algorithm for Neural GenerationCode0
Empirical Study of Off-Policy Policy Evaluation for Reinforcement LearningCode0
Bootstrap State Representation using Style Transfer for Better Generalization in Deep Reinforcement LearningCode0
Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement LearningCode0
Empowerment-driven Exploration using Mutual Information EstimationCode0
A2-RL: Aesthetics Aware Reinforcement Learning for Image CroppingCode0
Improving Coordination in Small-Scale Multi-Agent Deep Reinforcement Learning through Memory-driven CommunicationCode0
Enabling Adaptive Agent Training in Open-Ended Simulators by Targeting DiversityCode0
A General, Evolution-Inspired Reward Function for Social RoboticsCode0
A Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement LearningCode0
Curiosity Killed or Incapacitated the Cat and the Asymptotically Optimal AgentCode0
On the Expressivity of Neural Networks for Deep Reinforcement LearningCode0
Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy ChurnCode0
Curiosity-Driven Multi-Criteria Hindsight Experience ReplayCode0
IRLAS: Inverse Reinforcement Learning for Architecture SearchCode0
Improving Dialogue Management: Quality Datasets vs ModelsCode0
GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning AlgorithmsCode0
CUP: A Conservative Update Policy Algorithm for Safe Reinforcement LearningCode0
Iroko: A Framework to Prototype Reinforcement Learning for Data Center Traffic ControlCode0
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM AlignmentCode0
Improving Environment Robustness of Deep Reinforcement Learning Approaches for Autonomous Racing Using Bayesian Optimization-based Curriculum LearningCode0
End-to-end grasping policies for human-in-the-loop robots via deep reinforcement learningCode0
GFlowNets and variational inferenceCode0
End-to-End Learning of Communications Systems Without a Channel ModelCode0
Learning of feature points without additional supervision improves reinforcement learning from imagesCode0
GFlowNet Training by Policy GradientsCode0
Improving Experience Replay through Modeling of Similar Transitions' SetsCode0
End-to-End Meta-Bayesian Optimisation with Transformer Neural ProcessesCode0
End-to-End Model-Free Reinforcement Learning for Urban Driving using Implicit AffordancesCode0
Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based GamesCode0
GHQ: Grouped Hybrid Q Learning for Heterogeneous Cooperative Multi-agent Reinforcement LearningCode0
Gifting in multi-agent reinforcement learningCode0
Bootstrap Advantage Estimation for Policy Optimization in Reinforcement LearningCode0
Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking AgentsCode0
End-to-End Reinforcement Learning for Automatic Taxonomy InductionCode0
End-to-End Reinforcement Learning for Torque Based Variable Height HoppingCode0
Improving Exploration in Soft-Actor-Critic with Normalizing Flows PoliciesCode0
End-to-End Robotic Reinforcement Learning without Reward EngineeringCode0
End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control TasksCode0
CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple CriticsCode0
End-to-End Video Captioning with Multitask Reinforcement LearningCode0
A general class of surrogate functions for stable and efficient reinforcement learningCode0
"Give Me an Example Like This": Episodic Active Reinforcement Learning from DemonstrationsCode0
Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short DelaysCode0
Learning-Driven Exploration for Reinforcement LearningCode0
Energy-Based Hindsight Experience PrioritizationCode0
Show:102550
← PrevPage 294 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified