SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1290112950 of 15113 papers

TitleStatusHype
Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to ActionsCode0
Predicting Real-time Scientific Experiments Using Transformer models and Reinforcement LearningCode0
Predicting optimal value functions by interpolating reward functions in scalarized multi-objective reinforcement learningCode0
On Instrumental Variable Regression for Deep Offline Policy EvaluationCode0
Revisiting the Softmax Bellman Operator: New Benefits and New PerspectiveCode0
Reinforcement Learning under ThreatsCode0
MyCaffe: A Complete C# Re-Write of Caffe with Reinforcement LearningCode0
Towards Similarity Graphs Constructed by Deep Reinforcement LearningCode0
Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning ApproachCode0
On Improving Deep Reinforcement Learning for POMDPsCode0
ViZDoom Competitions: Playing Doom from PixelsCode0
Modular Networks Prevent Catastrophic Interference in Model-Based Multi-Task Reinforcement LearningCode0
Reward Certification for Policy Smoothed Reinforcement LearningCode0
Reward-Conditioned PoliciesCode0
Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning MethodCode0
Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and ApplicationCode0
Reward Delay Attacks on Deep Reinforcement LearningCode0
Reward Design For An Online Reinforcement Learning Algorithm Supporting Oral Self-CareCode0
Single Episode Policy Transfer in Reinforcement LearningCode0
Reward Design for Reinforcement Learning AgentsCode0
On Credit Assignment in Hierarchical Reinforcement LearningCode0
Reinforcement learning to learn quantum states for Heisenberg scaling accuracyCode0
Single-partition adaptive Q-learningCode0
Reward Engineering for Generating Semi-structured ExplanationCode0
Reward Engineering for Object Pick and Place TrainingCode0
Reward Estimation for Variance Reduction in Deep Reinforcement LearningCode0
On Context Distribution Shift in Task Representation Learning for Offline Meta RLCode0
Unified Off-Policy Learning to Rank: a Reinforcement Learning PerspectiveCode0
Meta-Reinforcement Learning in Broad and Non-Parametric EnvironmentsCode0
Towards Solving Text-based Games by Producing Adaptive Action SpacesCode0
Mutual Information Based Knowledge Transfer Under State-Action Dimension MismatchCode0
TGRPO :Fine-tuning Vision-Language-Action Model via Trajectory-wise Group Relative Policy OptimizationCode0
Towards Symbolic Reinforcement Learning with Common SenseCode0
SiT: Symmetry-Invariant Transformers for Generalisation in Reinforcement LearningCode0
What is the Solution for State-Adversarial Multi-Agent Reinforcement Learning?Code0
Predictable Reinforcement Learning Dynamics through Entropy Rate MinimizationCode0
Unified State Representation Learning under Data AugmentationCode0
Rewarding Coreference Resolvers for Being Consistent with World KnowledgeCode0
On Catastrophic Interference in Atari 2600 GamesCode0
PPO Dash: Improving Generalization in Deep Reinforcement LearningCode0
OIL-AD: An Anomaly Detection Framework for Sequential Decision SequencesCode0
The Arcade Learning Environment: An Evaluation Platform for General AgentsCode0
PPO-CMA: Proximal Policy Optimization with Covariance Matrix AdaptationCode0
Mutation Testing of Deep Reinforcement Learning Based on Real FaultsCode0
Skill Decision TransformerCode0
Towards the Use of Deep Reinforcement Learning with Global Policy For Query-based Extractive SummarisationCode0
MUSE: Modularizing Unsupervised Sense EmbeddingsCode0
Reward Learning for Efficient Reinforcement Learning in Extractive Document SummarisationCode0
Reward learning from human preferences and demonstrations in AtariCode0
The Atari Data ScraperCode0
Show:102550
← PrevPage 259 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified