SOTAVerified

Deep Q-Learning with Low Switching Cost

2021-01-01Unverified0· sign in to hype

Shusheng Xu, Simon Shaolei Du, Yi Wu

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We initiate the study on deep reinforcement learning problems that require low switching cost, i.e., small number of policy switches during training. Such a requirement is ubiquitous in many applications, such as medical domains, recommendation systems, education, robotics, dialogue agents, etc, where the deployed policy that actually interacts with the environment cannot change frequently. Our paper investigates different policy switching criteria based on deep Q-networks and further proposes an adaptive approach based on the feature distance between the deployed Q-network and the underlying learning Q-network. Through extensive experiments on a medical treatment environment and a collection of the Atari games, we find our feature-switching criterion substantially decreases the switching cost while maintains a similar sample efficiency to the case without the low-switching-cost constraint. We also complement this empirical finding with a theoretical justification from a representation learning perspective.

Tasks

Reproductions