SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 44514475 of 15113 papers

TitleStatusHype
VacSIM: Learning Effective Strategies for COVID-19 Vaccine Distribution using Reinforcement LearningCode0
Multi-Agent Connected Autonomous Driving using Deep Reinforcement LearningCode0
Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control TasksCode0
Multi-agent Cooperative Games Using Belief Map Assisted TrainingCode0
On-Policy Trust Region Policy Optimisation with Replay BuffersCode0
Value-Free Policy Optimization via Reward PartitioningCode0
On Practical Reinforcement Learning: Provable Robustness, Scalability, and Statistical EfficiencyCode0
Meta Reinforcement Learning with Task Embedding and Shared PolicyCode0
Cooperative Multi-Agent Reinforcement Learning with Hypergraph ConvolutionCode0
Reinforcement Learning Decoders for Fault-Tolerant Quantum ComputationCode0
Value Iteration for Learning Concurrently Executable Robotic Control TasksCode0
Value Iteration NetworksCode0
Reinforcement Learning Discovers Efficient Decentralized Graph Path Search StrategiesCode0
Value Prediction NetworkCode0
NerveNet: Learning Structured Policy with Graph Neural NetworksCode0
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement LearningCode0
Vanilla Gradient Descent for Oblique Decision TreesCode0
Policy Learning for Malaria ControlCode0
Policy Learning Using Weak SupervisionCode0
MDP Playground: An Analysis and Debug Testbed for Reinforcement LearningCode0
Policy Mirror Descent with LookaheadCode0
Variance Networks: When Expectation Does Not Meet Your ExpectationsCode0
Variance Reduction based Experience Replay for Policy OptimizationCode0
Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning ApproachCode0
Reinforcement Learning-enhanced Shared-account Cross-domain Sequential RecommendationCode0
Show:102550
← PrevPage 179 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified