Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

2022-06-24Unverified0· sign in to hype

Yifan Lin, Yuhao Wang, Enlu Zhou

Unverified — Be the first to reproduce this paper.

Abstract

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the corresponding reward. In particular, we consider mean-variance as the risk criterion, and the best arm is the one with the largest mean-variance reward. We apply the Thompson Sampling algorithm for the disjoint model, and provide a comprehensive regret analysis for a variant of the proposed algorithm. For T rounds, K actions, and d-dimensional feature vectors, we prove a regret bound of O((1++1) d T Kd K T^1+2 K 1) that holds with probability 1- under the mean-variance criterion with risk tolerance , for any 0<<12, 0<<1. The empirical performance of our proposed algorithms is demonstrated via a portfolio selection problem.

Tasks

Thompson Sampling

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

Abstract

Tasks

Reproductions