SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1445114500 of 15113 papers

TitleStatusHype
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?Code0
Centralized Model and Exploration Policy for Multi-Agent RLCode0
Inverse reinforcement learning for video gamesCode0
Decomposing Control Lyapunov Functions for Efficient Reinforcement LearningCode0
A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to ImitationCode0
Leveraging Sequentiality in Reinforcement Learning from a Single DemonstrationCode0
GAN Q-learningCode0
Efficient and Scalable Deep Reinforcement Learning for Mean Field Control GamesCode0
Efficient Architecture Search by Network TransformationCode0
CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement LearningCode0
Gap-Dependent Unsupervised Exploration for Reinforcement LearningCode0
Learning RL-Policies for Joint Beamforming Without Exploration: A Batch Constrained Off-Policy ApproachCode0
Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement LearningCode0
Efficient bimanual handover and rearrangement via symmetry-aware actor-critic learningCode0
CEIP: Combining Explicit and Implicit Priors for Reinforcement Learning with DemonstrationsCode0
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement LearningCode0
Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based ControlCode0
Gaussian Processes for Data-Efficient Learning in Robotics and ControlCode0
Inverse Reinforcement Learning in Contextual MDPsCode0
DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement LearningCode0
Causal State Distillation for Explainable Reinforcement LearningCode0
A Study on Overfitting in Deep Reinforcement LearningCode0
Efficient Decoupled Neural Architecture Search by Structure and Operation SamplingCode0
Efficient Deep Reinforcement Learning via Adaptive Policy TransferCode0
Efficient Deep Reinforcement Learning with Predictive Processing Proximal Policy OptimizationCode0
Efficient Dialog Policy Learning via Positive Memory RetentionCode0
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RLCode0
A Study of Reinforcement Learning for Neural Machine TranslationCode0
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based ModelsCode0
Causal Reasoning from Meta-reinforcement LearningCode0
Inter-Level Cooperation in Hierarchical Reinforcement LearningCode0
Imitation Learning by Reinforcement LearningCode0
Active exploration in parameterized reinforcement learningCode0
Imitation Learning for Sentence Generation with Dilated Convolutions Using Adversarial TrainingCode0
A Study of Plasticity Loss in On-Policy Deep Reinforcement LearningCode0
Generalised Discount Functions applied to a Monte-Carlo AImu ImplementationCode0
Agent-Time Attention for Sparse Rewards Multi-Agent Reinforcement LearningCode0
Causal Campbell-Goodhart's law and Reinforcement LearningCode0
Model-based Reinforcement Learning for Continuous Control with Posterior SamplingCode0
Learning When to Treat Business Processes: Prescriptive Process Monitoring with Causal Inference and Reinforcement LearningCode0
Catastrophic Interference in Reinforcement Learning: A Solution Based on Context Division and Knowledge DistillationCode0
Generalizable Resource Allocation in Stream Processing via Deep Reinforcement LearningCode0
Case-Based Inverse Reinforcement Learning Using Temporal CoherenceCode0
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Cascaded LSTMs based Deep Reinforcement Learning for Goal-driven DialogueCode0
Generalization and Exploration via Randomized Value FunctionsCode0
Generalization and Regularization in DQNCode0
Decision-Aware Actor-Critic with Function Approximation and Theoretical GuaranteesCode0
Learning Robust Reward Machines from Noisy LabelsCode0
Adaptive ROI Generation for Video Object Segmentation Using Reinforcement LearningCode0
Show:102550
← PrevPage 290 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified