SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 531540 of 655 papers

TitleStatusHype
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
Sequential Monte Carlo BanditsCode0
Deep Contextual Multi-armed Bandits0
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits0
Optimization of a SSP's Header Bidding Strategy using Thompson Sampling0
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems0
On The Differential Privacy of Thompson Sampling With Gaussian Prior0
Randomized Value Functions via Multiplicative Normalizing FlowsCode0
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling0
An Information-Theoretic Analysis for Thompson Sampling with Many Actions0
Show:102550
← PrevPage 54 of 66Next →

No leaderboard results yet.