A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering

2024-10-09Unverified0· sign in to hype

Qihan Qi, Xinsong Yang, Gang Xia, Daniel W. C. Ho, Pengyang Tang

Unverified — Be the first to reproduce this paper.

Abstract

This paper proposes a safety modulator actor-critic (SMAC) method to address safety constraint and overestimation mitigation in model-free safe reinforcement learning (RL). A safety modulator is developed to satisfy safety constraints by modulating actions, allowing the policy to ignore safety constraint and focus on maximizing reward. Additionally, a distributional critic with a theoretical update rule for SMAC is proposed to mitigate the overestimation of Q-values with safety constraints. Both simulation and real-world scenarios experiments on Unmanned Aerial Vehicles (UAVs) hovering confirm that the SMAC can effectively maintain safety constraints and outperform mainstream baseline algorithms.

Tasks

Reinforcement Learning (RL)Safe Reinforcement Learning SMAC SMAC+

A Safety Modulator Actor-Critic Method in Model-Free Safe Reinforcement Learning and Application in UAV Hovering

Abstract

Tasks

Reproductions