SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 1290112950 of 15113 papers

TitleStatusHype
The StarCraft Multi-Agent ChallengeCode1
Whole-Chain Recommendations0
A Bandit Framework for Optimal Selection of Reinforcement Learning Agents0
Reinforcement Learning from Hierarchical CriticsCode0
Distributional reinforcement learning with linear function approximation0
Rethinking the Discount Factor in Reinforcement Learning: A Decision Theoretic Approach0
Novelty Search for Deep Reinforcement Learning Policy Network Weights by Action Sequence Edit Metric DistanceCode0
Metaoptimization on a Distributed System for Deep Reinforcement Learning0
Visual search and recognition for robot task execution and monitoring0
Bayesian Reinforcement Learning via Deep, Sparse SamplingCode0
Artificial Intelligence for Prosthetics - challenge solutionsCode0
Neural Fictitious Self-Play on ELF Mini-RTS0
On L_2-consistency of nearest neighbor matching0
Space Navigator: a Tool for the Optimization of Collision Avoidance Maneuvers0
Decentralized Multi-Agents by Imitation of a Centralized Controller0
A Guiding Principle for Causal Decision Problems0
Adaptive Stress Testing for Autonomous Vehicles0
Separating value functions across time-scalesCode0
Reinforcement Learning for Optimal Load Distribution Sequencing in Resource-Sharing System0
Polyphonic Music Composition with LSTM Neural Networks and Reinforcement Learning0
Learning to Schedule Communication in Multi-agent Reinforcement LearningCode0
AlphaStar: An Evolutionary Computation Perspective0
Interactively shaping robot behaviour with unlabeled human instructions0
Total stochastic gradient algorithms and applications in reinforcement learning0
The Natural Language of ActionsCode0
PIPPS: Flexible Model-Based Policy Search Robust to the Curse of ChaosCode0
Value-aware Recommendation based on Reinforced Profit Maximization in E-commerce Systems0
A Meta-MDP Approach to Exploration for Lifelong Reinforcement LearningCode0
Certified Reinforcement Learning with Logic GuidanceCode1
Learning User Preferences via Reinforcement Learning with Spatial Interface Valuing0
When Collaborative Filtering Meets Reinforcement Learning0
Non-asymptotic Analysis of Biased Stochastic Approximation Scheme0
Visual Rationalizations in Deep Reinforcement Learning for Atari Games0
Policy Consolidation for Continual Reinforcement LearningCode0
Privacy Preserving Off-Policy Evaluation0
Competitive Experience Replay0
Learning Action Representations for Reinforcement Learning0
Joint Entity Linking with Deep Reinforcement Learning0
A Geometric Perspective on Optimal Representations for Reinforcement Learning0
An Optimization Framework for Task Sequencing in Curriculum Learning0
Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization PerspectiveCode0
Tsallis Reinforcement Learning: A Unified Framework for Maximum Entropy Reinforcement Learning0
Addressing Sample Complexity in Visual Tasks Using HER and Hallucinatory GANsCode0
Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning0
The Value Function Polytope in Reinforcement Learning0
Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning0
Privacy-preserving Q-Learning with Functional Noise in Continuous State SpacesCode0
A Comparative Analysis of Expected and Distributional Reinforcement Learning0
Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement0
Safe, Efficient, and Comfortable Velocity Control based on Reinforcement Learning for Autonomous DrivingCode0
Show:102550
← PrevPage 259 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified