SOTAVerified

Variational oracle guiding for reinforcement learning

2021-09-29ICLR 2022Unverified0· sign in to hype

Dongqi Han, Tadashi Kozuno, Xufang Luo, Zhao-Yun Chen, Kenji Doya, Yuqing Yang, Dongsheng Li

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

How to make intelligent decisions is a central problem in machine learning and cognitive science. Despite recent successes of deep reinforcement learning (RL) in various decision making problems, an important but under-explored aspect is how to leverage oracle observation (the information that is invisible during online decision making, but is available during offline training) to facilitate learning. For example, human experts will look at the replay after a Poker game, in which they can check the opponents' hands to improve their estimation of the opponents' hands from the visible information during playing. In this work, we study such problems based on Bayesian theory and derive an objective to leverage oracle observation in RL using variational method. Our key contribution is to propose a general learning framework referred to as variational latent oracle guiding (VLOG) for deep RL. VLOG is featured with preferable properties such as its robust and promising performance and its versatility to incorporate with any value-based deep RL algorithm. We empirically demonstrate the effectiveness of VLOG in online and offline RL domains using decision-making tasks ranged from video games to a challenging tile-based game Mahjong. Furthermore, we publish the environment of Mahjong and the corresponding offline RL dataset as a benchmark to facilitate future research on oracle guiding.

Tasks

Reproductions