SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1500115050 of 15113 papers

TitleStatusHype
A Regularized Opponent Model with Maximum Entropy ObjectiveCode0
G-PECNet: Towards a Generalizable Pedestrian Trajectory Prediction SystemCode0
Analyzing Reinforcement Learning Benchmarks with Random Weight GuessingCode0
Explainable Reinforcement Learning Through a Causal LensCode0
Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement LearningCode0
Continual Reinforcement Learning in 3D Non-stationary EnvironmentsCode0
Explainable Reinforcement Learning via Model TransformsCode0
ARCHER: Aggressive Rewards to Counter bias in Hindsight Experience ReplayCode0
Left Ventricle Contouring in Cardiac Images Based on Deep Reinforcement LearningCode0
Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic LocomotionCode0
Continual Reinforcement Learning for HVAC Systems Control: Integrating Hypernetworks and Transfer LearningCode0
Learning Goal-Oriented Visual Dialog via Tempered Policy GradientCode0
Analysis and Control of a Planar QuadrotorCode0
MAP Propagation Algorithm: Faster Learning with a Team of Reinforcement Learning AgentsCode0
Explaining Reinforcement Learning Policies through Counterfactual TrajectoriesCode0
HAMMER: Multi-Level Coordination of Reinforcement Learning Agents via Learned MessagingCode0
Explaining RL Decisions with TrajectoriesCode0
Explain Your Move: Understanding Agent Actions Using Focused Feature SaliencyCode0
Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature AttributionCode0
Continual Policy Distillation of Reinforcement Learning-based Controllers for Soft Robotic In-Hand ManipulationCode0
A Centralised Soft Actor Critic Deep Reinforcement Learning Approach to District Demand Side Management through CityLearnCode0
Explanation-Aware Experience Replay in Rule-Dense EnvironmentsCode0
Handling Delay in Real-Time Reinforcement LearningCode0
Explicable Reward Design for Reinforcement Learning AgentsCode0
Explicit Explore-Exploit Algorithms in Continuous State SpacesCode0
Learning to Bid Long-Term: Multi-Agent Reinforcement Learning with Long-Term and Sparse Reward in Repeated Auction GamesCode0
Learning Graph-Enhanced Commander-Executor for Multi-Agent NavigationCode0
An agentic system with reinforcement-learned subsystem improvements for parsing form-like documentsCode0
Adaptive Data Exploitation in Deep Reinforcement LearningCode0
Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RLCode0
Adversarial Learning for Neural Dialogue GenerationCode0
Adversarial Intrinsic Motivation for Reinforcement LearningCode0
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement LearningCode0
Learning Heuristics for Quantified Boolean Formulas through Deep Reinforcement LearningCode0
Combining imagination and heuristics to learn strategies that generalizeCode0
KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement LearningCode0
Adaptive Curriculum Generation from Demonstrations for Sim-to-Real Visuomotor ControlCode0
Data Augmentation through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement LearningCode0
Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement LearningCode0
BadRL: Sparse Targeted Backdoor Attack Against Reinforcement LearningCode0
Baconian: A Unified Open-source Framework for Model-Based Reinforcement LearningCode0
Harnessing Structures for Value-Based Planning and Reinforcement LearningCode0
Continual Learning In Environments With Polynomial Mixing TimesCode0
Kernel Density Bayesian Inverse Reinforcement LearningCode0
Learning to search efficiently for causally near-optimal treatmentsCode0
Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience RehearsalCode0
Exploiting Multiple Abstractions in Episodic RL via Reward ShapingCode0
Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement LearningCode0
ARAML: A Stable Adversarial Training Framework for Text GenerationCode0
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL PoliciesCode0
Show:102550
← PrevPage 301 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified