SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 631640 of 655 papers

TitleStatusHype
Thompson Sampling for Budgeted Multi-armed Bandits0
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems0
A Note on Information-Directed Sampling and Thompson Sampling0
Bandit Convex Optimization: sqrtT Regret in One Dimension0
Thompson sampling with the online bootstrap0
Freshness-Aware Thompson Sampling0
Towards Optimal Algorithms for Prediction with Expert Advice0
Thompson Sampling for Learning Parameterized Markov Decision Processes0
Efficient Learning in Large-Scale Combinatorial Semi-Bandits0
An Information-Theoretic Analysis of Thompson Sampling0
Show:102550
← PrevPage 64 of 66Next →

No leaderboard results yet.