SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 261270 of 655 papers

TitleStatusHype
On Kernelized Multi-Armed Bandits with Constraints0
Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic maskingCode0
Thompson Sampling on Asymmetric α-Stable Bandits0
Regenerative Particle Thompson Sampling0
Multi-Agent Active Search using Detection and Location Uncertainty0
An Analysis of Ensemble Sampling0
Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian ProcessesCode0
Partial Likelihood Thompson Sampling0
Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework0
Thompson Sampling with Unrestricted Delays0
Show:102550
← PrevPage 27 of 66Next →

No leaderboard results yet.