SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1375113800 of 15113 papers

TitleStatusHype
Uncertainty-driven Trajectory Truncation for Data Augmentation in Offline Reinforcement LearningCode0
On the Reliability and Generalizability of Brain-inspired Reinforcement Learning AlgorithmsCode0
TAdam: A Robust Stochastic Gradient OptimizerCode0
DC4L: Distribution Shift Recovery via Data-Driven Control for Deep Learning ModelsCode0
Model-Ensemble Trust-Region Policy OptimizationCode0
MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer SamplingCode0
WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management StrategiesCode0
Towards Interpretable Reinforcement Learning Using Attention Augmented AgentsCode0
Provably Correct Optimization and Exploration with Non-linear PoliciesCode0
Visualizing and Understanding Atari AgentsCode0
Monte Carlo Q-learning for General Game PlayingCode0
Taming the Noise in Reinforcement Learning via Soft UpdatesCode0
On the Convergence Theory of Debiased Model-Agnostic Meta-Reinforcement LearningCode0
Measuring the Reliability of Reinforcement Learning AlgorithmsCode0
Semifactual Explanations for Reinforcement LearningCode0
Semi-Markov Offline Reinforcement Learning for HealthcareCode0
Semi-Offline Reinforcement Learning for Optimized Text GenerationCode0
Uncovering Instabilities in Variational-Quantum Deep Q-NetworksCode0
Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement LearningCode0
Semi-supervised Deep Reinforcement Learning in Support of IoT and Smart City ServicesCode0
On the Perturbed States for Transformed Input-robust Reinforcement LearningCode0
Provable Defense against Backdoor Policies in Reinforcement LearningCode0
Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RLCode0
Understanding Adversarial Attacks on Observations in Deep Reinforcement LearningCode0
Model-Based Reinforcement Learning for AtariCode0
Metrics and continuity in reinforcement learningCode0
Efficient Meta Subspace OptimizationCode0
Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement LearningCode0
Towards Learning Transferable Conversational Skills using Multi-dimensional Dialogue ModellingCode0
Sentence Simplification with Deep Reinforcement LearningCode0
Value-Free Policy Optimization via Reward PartitioningCode0
Meta Reinforcement Learning with Task Embedding and Shared PolicyCode0
Natural Question Generation with Reinforcement Learning Based Graph-to-Sequence ModelCode0
On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+(λ,λ))-GACode0
Task-Agnostic Dynamics Priors for Deep Reinforcement LearningCode0
Separating value functions across time-scalesCode0
xSRL: Safety-Aware Explainable Reinforcement Learning -- Safety as a Product of ExplainabilityCode0
ProtoX: Explaining a Reinforcement Learning Agent via PrototypingCode0
SeqGAN: Sequence Generative Adversarial Nets with Policy GradientCode0
Sequence Adaptation via Reinforcement Learning in Recommender SystemsCode0
On the Implementation of a Reinforcement Learning-based Capacity Sharing Algorithm in O-RANCode0
Understanding Curriculum Learning in Policy Optimization for Online Combinatorial OptimizationCode0
Sequence Modeling of Temporal Credit Assignment for Episodic Reinforcement LearningCode0
Cooperative Multi-Agent Reinforcement Learning with Hypergraph ConvolutionCode0
Towards Model-based Reinforcement Learning for Industry-near EnvironmentsCode0
On the Generalization of Representations in Reinforcement LearningCode0
Measuring Interventional Robustness in Reinforcement LearningCode0
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement LearningCode0
Understanding Game-Playing Agents with Natural Language AnnotationsCode0
Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics BeliefCode0
Show:102550
← PrevPage 276 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified