SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 471480 of 655 papers

TitleStatusHype
Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsCode0
Robust Dynamic Assortment Optimization in the Presence of Outlier Customers0
A Quantile-based Approach for Hyperparameter Transfer Learning0
A Copula approach for hyperparameter transfer learning0
Efficient Multivariate Bandit Algorithm with Path Planning0
An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits0
Online Causal Inference for Advertising in Real-Time Bidding Auctions0
A Batched Multi-Armed Bandit Approach to News Headline Testing0
A Bayesian Choice Model for Eliminating Feedback Loops0
Thompson Sampling with Approximate Inference0
Show:102550
← PrevPage 48 of 66Next →

No leaderboard results yet.