SOTAVerified

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

2023-02-21Unverified0· sign in to hype

Han Zhong, Jiachen Hu, Yecheng Xue, Tongyang Li, LiWei Wang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with S states, A actions, and horizon H, and establish an O(poly(S, A, H, T)) worst-case regret for it, where T is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with d-dimensional linear representation and prove that it enjoys O(poly(d, H, T)) regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the (T)-regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret.

Tasks

Reproductions