SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 431440 of 655 papers

TitleStatusHype
SPRT-based Efficient Best Arm Identification in Stochastic Bandits0
Stable Thompson Sampling: Valid Inference via Variance Inflation0
Stage-wise Conservative Linear Bandits0
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits0
Stochastically Constrained Best Arm Identification with Thompson Sampling0
Stochastic Neural Network with Kronecker Flow0
Streaming kernel regression with provably adaptive mean, variance, and regularization0
Surrogate modeling for Bayesian optimization beyond a single Gaussian process0
Synthetically Controlled Bandits0
Taming Non-stationary Bandits: A Bayesian Approach0
Show:102550
← PrevPage 44 of 66Next →

No leaderboard results yet.