SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 201210 of 655 papers

TitleStatusHype
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation0
Discounted Thompson Sampling for Non-Stationary Bandit Problems0
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning0
Distributed Thompson Sampling0
Adaptive Combinatorial Allocation0
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
Double-Linear Thompson Sampling for Context-Attentive Bandits0
A Copula approach for hyperparameter transfer learning0
Show:102550
← PrevPage 21 of 66Next →

No leaderboard results yet.