Guided Exploration in Deep Reinforcement Learning
Sahisnu Mazumder, Bing Liu, Shuai Wang, Yingxuan Zhu, Xiaotian Yin, Lifeng Liu, Jian Li, Yongbing Huang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper proposes a new method to drastically speed up deep reinforcement learning (deep RL) training for problems that have the property of state-action permissibility (SAP). Two types of permissibility are defined under SAP. The first type says that after an action a_t is performed in a state s_t and the agent reaches the new state s_t+1, the agent can decide whether the action a_t is permissible or not permissible in state s_t. The second type says that even without performing the action a_t in state s_t, the agent can already decide whether a_t is permissible or not in s_t. An action is not permissible in a state if the action can never lead to an optimal solution and thus should not be tried. We incorporate the proposed SAP property into two state-of-the-art deep RL algorithms to guide their state-action exploration. Results show that the SAP guidance can markedly speed up training.