Off-policy estimation with adaptively collected data: the power of online learning

2024-11-19Unverified0· sign in to hype

Jeonghwan Lee, Cong Ma

Unverified — Be the first to reproduce this paper.

Abstract

We consider estimation of a linear functional of the treatment effect using adaptively collected data. This task finds a variety of applications including the off-policy evaluation (OPE) in contextual bandits, and estimation of the average treatment effect (ATE) in causal inference. While a certain class of augmented inverse propensity weighting (AIPW) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first establish generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we also propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error. To illustrate this, we provide three concrete instantiations in ( 1) the tabular case; ( 2) the case of linear function approximation; and ( 3) the case of general function approximation for the outcome model. We then provide a local minimax lower bound to show the instance-dependent optimality of the AIPW estimator using no-regret online learning algorithms.

Tasks

Causal Inference Multi-Armed Bandits Off-policy evaluation

Off-policy estimation with adaptively collected data: the power of online learning

Abstract

Tasks

Reproductions