Thompson Sampling on Asymmetric α-Stable Bandits

2022-03-19Unverified0· sign in to hype

Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei

Unverified — Be the first to reproduce this paper.

Abstract

In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric -stable distributions and explore their applications in modelling financial and wireless data.

Tasks

reinforcement-learning Reinforcement Learning (RL)Thompson Sampling

Thompson Sampling on Asymmetric α-Stable Bandits

Abstract

Tasks

Reproductions