SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 98519900 of 15113 papers

TitleStatusHype
MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization0
MODRL/D-EL: Multiobjective Deep Reinforcement Learning with Evolutionary Learning for Multiobjective Optimization0
Modular Architecture for StarCraft II with Deep Reinforcement Learning0
Modularity benefits reinforcement learning agents with competing homeostatic drives0
Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment0
Modulated Policy Hierarchies0
Modulating Reservoir Dynamics via Reinforcement Learning for Efficient Robot Skill Synthesis0
MoET: Interpretable and Verifiable Reinforcement Learning via Mixture of Expert Trees0
Molecular Design in Synthetically Accessible Chemical Space via Deep Reinforcement Learning0
Molecular Generative Adversarial Network with Multi-Property Optimization0
Mollification Effects of Policy Gradient Methods0
Momentum in Reinforcement Learning0
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking0
MONAS: Multi-Objective Neural Architecture Search using Reinforcement Learning0
MONEYBaRL: Exploiting pitcher decision-making using Reinforcement Learning0
Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials0
Monte-Carlo Planning and Learning with Language Action Value Estimates0
Monte Carlo Planning with Large Language Model for Text-Based Game Agents0
Monte-Carlo Siamese Policy on Actor for Satellite Image Super Resolution0
Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning0
Monte-Carlo Tree Search for Policy Optimization0
Moody Learners -- Explaining Competitive Behaviour of Reinforcement Learning Agents0
MOORe: Model-based Offline-to-Online Reinforcement Learning0
Moral reinforcement learning using actual causation0
More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences0
More Efficient Off-Policy Evaluation through Regularized Targeted Learning0
(More) Efficient Reinforcement Learning via Posterior Sampling0
MOReL: Model-Based Offline Reinforcement Learning0
More Robust Doubly Robust Off-policy Evaluation0
MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models0
MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading0
MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding0
Motion Perception in Reinforcement Learning with Dynamic Objects0
Motion Planner Augmented Reinforcement Learning for Robot Manipulation in Obstructed Environments0
Motion Planning by Reinforcement Learning for an Unmanned Aerial Vehicle in Virtual Open Space with Static Obstacles0
Motion Planning for Autonomous Vehicles in the Presence of Uncertainty Using Reinforcement Learning0
Motion Prediction on Self-driving Cars: A Review0
MotionRL: Align Text-to-Motion Generation to Human Preferences with Multi-Reward Reinforcement Learning0
Motivating Physical Activity via Competitive Human-Robot Interaction0
MP3: Movement Primitive-Based (Re-)Planning Policy0
MPC4RL -- A Software Package for Reinforcement Learning based on Model Predictive Control0
MPC-based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles0
MPC-based Reinforcement Learning for Economic Problems with Application to Battery Storage0
MQES: Max-Q Entropy Search for Efficient Exploration in Continuous Reinforcement Learning0
MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server0
MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty0
MSDF: A Deep Reinforcement Learning Framework for Service Function Chain Migration0
MS-Ranker: Accumulating Evidence from Potentially Correct Candidates for Answer Selection0
MSRL: Distributed Reinforcement Learning with Dataflow Fragments0
MSVIPER: Improved Policy Distillation for Reinforcement-Learning-Based Robot Navigation0
Show:102550
← PrevPage 198 of 303Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified