SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 98769900 of 15113 papers

TitleStatusHype
Learn to Play Tetris with Deep Reinforcement Learning0
Learn To Manage Portfolio With Reinforcement Learning0
IPM Move Planner: AN EFFICIENT EXPLOITING DEEP REINFORCEMENT LEARNING WITH MONTE CARLO TREE SEARCH0
Automatic Source Code Summarization via Reinforcement Learning0
Increasing Data Efficiency of Driving Agent By World ModelCode0
A case for new neural network smoothness constraints0
Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation0
Learning Mobile Robot Navigation in the Dense Crowd with Deep Reinforcement Learning0
Reinforcement Learning with Subspaces using Free Energy Paradigm0
Tutoring Reinforcement Learning via Feedback Control0
Noise-Robust End-to-End Quantum Control using Deep Autoregressive Policy Networks0
Semi-supervised reward learning for offline reinforcement learning0
OPAC: Opportunistic Actor-Critic0
Regularizing Action Policies for Smooth Control with Reinforcement Learning0
Reinforcement Learning Agents for Ubisoft's Roller Champions0
Performance-Weighed Policy Sampling for Meta-Reinforcement Learning0
Blending MPC & Value Function Approximation for Efficient Reinforcement Learning0
Flatland-RL : Multi-Agent Reinforcement Learning on Trains0
Deep Reinforcement Learning for Stock Portfolio Optimization0
A Deep Reinforcement Learning Approach for Ramp Metering Based on Traffic Video Data0
Deep Reinforcement Learning for Long Term Hydropower Production Scheduling0
Interactive Search Based on Deep Reinforcement Learning0
Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation0
Transfer Learning for Efficient Iterative Safety Validation0
MLComp: A Methodology for Machine Learning-based Performance Estimation and Adaptive Selection of Pareto-Optimal Compiler Optimization Sequences0
Show:102550
← PrevPage 396 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified