SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 526550 of 655 papers

TitleStatusHype
Thompson Sampling Algorithms for Cascading Bandits0
Contextual Multi-Armed Bandits for Causal Marketing0
Efficient Linear Bandits through Matrix Sketching0
Incorporating Behavioral Constraints in Online AI Systems0
Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms0
Adaptive Grey-Box Fuzz-Testing with Thompson Sampling0
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
Sequential Monte Carlo BanditsCode0
Deep Contextual Multi-armed Bandits0
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits0
Optimization of a SSP's Header Bidding Strategy using Thompson Sampling0
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems0
On The Differential Privacy of Thompson Sampling With Gaussian Prior0
Randomized Value Functions via Multiplicative Normalizing FlowsCode0
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling0
An Information-Theoretic Analysis for Thompson Sampling with Many Actions0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
New Insights into Bootstrapping for Bandits0
Analysis of Thompson Sampling for Graphical Bandits Without the Graphs0
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits0
Profitable Bandits0
Thompson Sampling for Combinatorial Semi-Bandits0
Active Reinforcement Learning with Monte-Carlo Tree Search0
Satisficing in Time-Sensitive Bandit Learning0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Show:102550
← PrevPage 22 of 27Next →

No leaderboard results yet.