SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 91100 of 655 papers

TitleStatusHype
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Asymptotic Convergence of Thompson Sampling0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
Asynchronous Multi Agent Active Search0
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization0
An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization0
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems0
Adaptive Sensor Placement for Continuous Spaces0
Show:102550
← PrevPage 10 of 66Next →

No leaderboard results yet.