Regret Minimization and Statistical Inference in Online Decision Making with High-dimensional Covariates
Congyuan Duan, Wanteng Ma, Jiashuo Jiang, Dong Xia
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
This paper investigates regret minimization, statistical inference, and their interplay in high-dimensional online decision-making based on the sparse linear context bandit model. We integrate the -greedy bandit algorithm for decision-making with a hard thresholding algorithm for estimating sparse bandit parameters and introduce an inference framework based on a debiasing method using inverse propensity weighting. Under a margin condition, our method achieves either O(T^1/2) regret or classical O(T^1/2)-consistent inference, indicating an unavoidable trade-off between exploration and exploitation. If a diverse covariate condition holds, we demonstrate that a pure-greedy bandit algorithm, i.e., exploration-free, combined with a debiased estimator based on average weighting can simultaneously achieve optimal O( T) regret and O(T^1/2)-consistent inference. We also show that a simple sample mean estimator can provide valid inference for the optimal policy's value. Numerical simulations and experiments on Warfarin dosing data validate the effectiveness of our methods.