SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 101125 of 655 papers

TitleStatusHype
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Information-Directed Selection for Top-Two AlgorithmsCode0
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple PlaysCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning ApproachCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Differentially Private Online Bayesian Estimation With Adaptive TruncationCode0
Randomized Value Functions via Multiplicative Normalizing FlowsCode0
Bandit Learning with Implicit FeedbackCode0
Adaptive Interventions with User-Defined Goals for Health Behavior ChangeCode0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian ProcessesCode0
Scalable Optimization for Wind Farm Control using Coordination GraphsCode0
Scalable Exploration via Ensemble++Code0
Bayesian Non-stationary Linear Bandits for Large-Scale Recommender SystemsCode0
Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized RecommendationsCode0
Simple Bayesian Algorithms for Best Arm IdentificationCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
Causal Bandits for Linear Structural Equation ModelsCode0
Bayesian bandits: balancing the exploration-exploitation tradeoff via double samplingCode0
Thompson Sampling Algorithms for Mean-Variance BanditsCode0
Distributed Thompson sampling under constrained communicationCode0
Thompson Sampling for Bandit Learning in Matching MarketsCode0
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
Show:102550
← PrevPage 5 of 27Next →

No leaderboard results yet.