Thompson Sampling on Asymmetric α-Stable Bandits
Zhendong Shi, Ercan E. Kuruoglu, Xiaoli Wei
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
In algorithm optimization in reinforcement learning, how to deal with the exploration-exploitation dilemma is particularly important. Multi-armed bandit problem can optimize the proposed solutions by changing the reward distribution to realize the dynamic balance between exploration and exploitation. Thompson Sampling is a common method for solving multi-armed bandit problem and has been used to explore data that conform to various laws. In this paper, we consider the Thompson Sampling approach for multi-armed bandit problem, in which rewards conform to unknown asymmetric -stable distributions and explore their applications in modelling financial and wireless data.