SOTAVerified

Exploration by Uncertainty in Reward Space

2018-09-27Unverified0· sign in to hype

Wei-Yang Qu, Yang Yu, Tang-Jie Lv, Ying-Feng Chen, Chang-Jie Fan

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Efficient exploration plays a key role in reinforcement learning tasks. Commonly used dithering strategies, such as-greedy, try to explore the action-state space randomly; this can lead to large demand for samples. In this paper, We propose an exploration method based on the uncertainty in reward space. There are two policies in this approach, the exploration policy is used for exploratory sampling in the environment, then the benchmark policy try to update by the data proven by the exploration policy. Benchmark policy is used to provide the uncertainty in reward space, e.g. td-error, which guides the exploration policy updating. We apply our method on two grid-world environments and four Atari games. Experiment results show that our method improves learning speed and have a better performance than baseline policies

Tasks

Reproductions