SOTAVerified

Guided Exploration in Deep Reinforcement Learning

2018-09-27Unverified0· sign in to hype

Sahisnu Mazumder, Bing Liu, Shuai Wang, Yingxuan Zhu, Xiaotian Yin, Lifeng Liu, Jian Li, Yongbing Huang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper proposes a new method to drastically speed up deep reinforcement learning (deep RL) training for problems that have the property of state-action permissibility (SAP). Two types of permissibility are defined under SAP. The first type says that after an action a_t is performed in a state s_t and the agent reaches the new state s_t+1, the agent can decide whether the action a_t is permissible or not permissible in state s_t. The second type says that even without performing the action a_t in state s_t, the agent can already decide whether a_t is permissible or not in s_t. An action is not permissible in a state if the action can never lead to an optimal solution and thus should not be tried. We incorporate the proposed SAP property into two state-of-the-art deep RL algorithms to guide their state-action exploration. Results show that the SAP guidance can markedly speed up training.

Tasks

Reproductions