Estimation and Inference in Distributional Reinforcement Learning
Liangyu Zhang, Yang Peng, Jiadong Liang, Wenhao Yang, Zhihua Zhang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/zhangliangyu32/estimationandinferencedistributionalrlOfficialIn paperpytorch★ 1
Abstract
In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete return distribution (denoted ^) attained by a given policy . We use the certainty-equivalence method to construct our estimator ^, given a generative model is available. In this circumstance we need a dataset of size O(|S||A|^2p(1-)^2p+2) to guarantee the p-Wasserstein metric between ^ and ^ less than with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size O(|S||A|^2(1-)^4) suffices to ensure the Kolmogorov metric and total variation metric between ^ and ^ is below with high probability. Furthermore, we investigate the asymptotic behavior of ^. We demonstrate that the ``empirical process'' n(^-^) converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class ^(F_W), also in the space of bounded functionals on indicator function class ^(F_KS) and bounded measurable function class ^(F_TV) when some mild conditions hold. Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of ^.