SOTAVerified

Reinforcement Learning (RL)

Reinforcement Learning (RL) involves training an agent to take actions in an environment to maximize a cumulative reward signal. The agent interacts with the environment and learns by receiving feedback in the form of rewards or punishments for its actions. The goal of reinforcement learning is to find the optimal policy or decision-making strategy that maximizes the long-term reward.

Papers

Showing 89769000 of 15113 papers

TitleStatusHype
A Modular and Transferable Reinforcement Learning Framework for the Fleet Rebalancing Problem0
Context-aware taxi dispatching at city-scale using deep reinforcement learning0
Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement LearningCode0
Safe Model-based Off-policy Reinforcement Learning for Eco-Driving in Connected and Automated Hybrid Electric Vehicles0
Transfer Learning and Curriculum Learning in Sokoban0
Unbiased Asymmetric Reinforcement Learning under Partial Observability0
Trajectory Modeling via Random Utility Inverse Reinforcement Learning0
Towards Scalable Verification of Deep Reinforcement LearningCode0
KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning0
A Generalised Inverse Reinforcement Learning Framework0
Bayesian Nonparametric Reinforcement Learning in LTE and Wi-Fi Coexistence0
A Comparison of Reward Functions in Q-Learning Applied to a Cart Position ProblemCode0
Interpretable UAV Collision Avoidance using Deep Reinforcement Learning0
FNAS: Uncertainty-Aware Fast Neural Architecture Search0
IGO-QNN: Quantum Neural Network Architecture for Inductive Grover Oracularization0
Verification of Dissipativity and Evaluation of Storage Function in Economic Nonlinear MPC using Q-Learning0
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence0
Room Clearance with Feudal Hierarchical Reinforcement Learning0
An Efficient Application of Neuroevolution for Competitive Multiagent LearningCode0
Attention-based Reinforcement Learning for Real-Time UAV Semantic Communication0
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Making by Reinforcement Learning0
An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap0
Certification of Iterative Predictions in Bayesian Neural NetworksCode0
De-Biased Modelling of Search Click Behavior with Reinforcement Learning0
Rule Augmented Unsupervised Constituency ParsingCode0
Show:102550
← PrevPage 360 of 605Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1PPGMean Normalized Performance0.76Unverified
2PPOMean Normalized Performance0.58Unverified