SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 321330 of 655 papers

TitleStatusHype
Non-Stationary Latent Bandits0
No Regrets for Learning the Prior in Bandits0
Observation-Free Attacks on Stochastic Bandits0
On Adaptive Estimation for Dynamic Bernoulli Bandits0
On Batch Bayesian Optimization0
On Dynamic Pricing with Covariates0
On Efficiency in Hierarchical Reinforcement Learning0
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise0
On Kernelized Multi-Armed Bandits with Constraints0
On learning Whittle index policy for restless bandits with scalable regret0
Show:102550
← PrevPage 33 of 66Next →

No leaderboard results yet.