SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 541550 of 655 papers

TitleStatusHype
Satisficing in Time-Sensitive Bandit Learning0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Scalable Generalized Linear Bandits: Online Computation and Hashing0
Scalable Neural Contextual Bandit for Recommender Systems0
Scalable regret for learning to control network-coupled subsystems with unknown dynamics0
Scalable Thompson Sampling using Sparse Gaussian Process Models0
Scalable Thompson Sampling via Optimal Transport0
Scaling Multi-Armed Bandit Algorithms0
Screening for an Infectious Disease as a Problem in Stochastic Control0
Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization0
Show:102550
← PrevPage 55 of 66Next →

No leaderboard results yet.