SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 341350 of 655 papers

TitleStatusHype
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow0
Random Effect Bandits0
Thompson Sampling for Unimodal Bandits0
Thompson Sampling with a Mixture Prior0
Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?0
A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms0
Parallelizing Thompson Sampling0
Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits0
Asymptotically Optimal Bandits under Weighted Information0
Diffusion Approximations for Thompson Sampling0
Show:102550
← PrevPage 35 of 66Next →

No leaderboard results yet.