Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

2021-02-16Unverified0· sign in to hype

Bingshan Hu, Zhiming Huang, Nishant A. Mehta, Nidhi Hegde

Unverified — Be the first to reproduce this paper.

Abstract

In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal O (_j: _j>0 (T) \_j, \ ) instance-dependent regret bound, where T is the finite learning horizon, _j denotes the suboptimality gap between the optimal arm and a suboptimal arm j, and is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an ((K) \_, \ ) instance-dependent regret lower bound and an (T(K) + (K)) minimax lower bound, where K is the total number of actions and _ denotes the minimum suboptimality gap among all the suboptimal actions. For the same differentially private full information setting, we also present an -differentially private algorithm whose instance-dependent regret and worst-case regret match our respective lower bounds up to an extra (T) factor.

Tasks

Thompson Sampling

Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment

Abstract

Tasks

Reproductions