SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 261270 of 655 papers

TitleStatusHype
Adaptively Learning to Select-Rank in Online Platforms0
Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation0
From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information0
Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization0
Bayesian Optimization-Based Beam Alignment for MmWave MIMO Communication Systems0
Feel-Good Thompson Sampling for Contextual Dueling Bandits0
Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?0
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits0
First-Order Bayesian Regret Analysis of Thompson Sampling0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
Show:102550
← PrevPage 27 of 66Next →

No leaderboard results yet.