Estimation and Inference in Distributional Reinforcement Learning

2023-09-29Code Available0· sign in to hype

Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang

Code Available — Be the first to reproduce this paper.

Code

github.com/zhangliangyu32/estimationandinferencedistributionalrl
OfficialIn paperpytorch★ 1

Abstract

In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted ^) attained by a given policy . We use the certainty-equivalence method to construct our estimator ^, given a generative model is available. In this circumstance we need a dataset of size O(|S||A|^2p(1-)^2p+2) to guarantee the p-Wasserstein metric between ^ and ^ less than with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size O(|S||A|^2(1-)^4) suffices to ensure the Kolmogorov metric and total variation metric between ^ and ^ is below with high probability. Furthermore, we investigate the asymptotic behavior of ^. We demonstrate that the ``empirical process'' n(^-^) converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class ^(F_W), also in the space of bounded functionals on indicator function class ^(F_KS) and bounded measurable function class ^(F_TV) when some mild conditions hold. Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of ^.

Tasks

Distributional Reinforcement Learning reinforcement-learning Reinforcement Learning

Estimation and Inference in Distributional Reinforcement Learning

Code

Abstract

Tasks

Reproductions