SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 51015125 of 15113 papers

TitleStatusHype
A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes0
Unified Automatic Control of Vehicular Systems with Reinforcement LearningCode1
Solving the vehicle routing problem with deep reinforcement learning0
Reinforcement learning with experience replay and adaptation of action dispersion0
Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation and Complexity AnalysisCode0
Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions0
Meta Reinforcement Learning with Successor Feature Based Context0
Combining Evolutionary Search with Behaviour Cloning for Procedurally Generated Content0
Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain RandomizationCode0
Deep Reinforcement Learning for System-on-Chip: Myths and Realities0
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement LearningCode1
Graph Inverse Reinforcement Learning from Diverse Videos0
Latent Properties of Lifelong Learning Systems0
RangL: A Reinforcement Learning Competition Platform0
Playing a 2D Game Indefinitely using NEAT and Reinforcement Learning0
Raising Student Completion Rates with Adaptive Curriculum and Contextual Bandits0
POSET-RL: Phase ordering for Optimizing Size and Execution Time using Reinforcement Learning0
Multi-Objective Provisioning of Network Slices using Deep Reinforcement Learning0
Structural Similarity for Improved Transfer in Reinforcement Learning0
Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control0
Dynamic Shielding for Reinforcement Learning in Black-Box Environments0
A Contact-Safe Reinforcement Learning Framework for Contact-Rich Robot Manipulation0
Safe and Robust Experience Sharing for Deterministic Policy Gradient AlgorithmsCode0
Unsupervised Training for Neural TSP Solver0
Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning0
Show:102550
← PrevPage 205 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified