SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 601610 of 655 papers

TitleStatusHype
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Asymptotic Convergence of Thompson Sampling0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Asynchronous Multi Agent Active Search0
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems0
A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning0
Automatic Ensemble Learning for Online Influence Maximization0
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning0
Bag of Policies for Distributional Deep Exploration0
BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration0
Show:102550
← PrevPage 61 of 66Next →

No leaderboard results yet.