SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 101110 of 655 papers

TitleStatusHype
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Online Learning of Decision Trees with Thompson SamplingCode0
On Provably Robust Meta-Bayesian OptimizationCode0
Information-Directed Selection for Top-Two AlgorithmsCode0
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple PlaysCode0
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning ApproachCode0
Bandit Learning with Implicit FeedbackCode0
Bayesian Non-stationary Linear Bandits for Large-Scale Recommender SystemsCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Show:102550
← PrevPage 11 of 66Next →

No leaderboard results yet.