SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 451460 of 655 papers

TitleStatusHype
Non-Stationary Bandit Learning via Predictive Sampling0
Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed Pricing0
Non-Stationary Latent Bandits0
No Regrets for Learning the Prior in Bandits0
Observation-Free Attacks on Stochastic Bandits0
On Adaptive Estimation for Dynamic Bernoulli Bandits0
On Batch Bayesian Optimization0
On Dynamic Pricing with Covariates0
On Efficiency in Hierarchical Reinforcement Learning0
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise0
Show:102550
← PrevPage 46 of 66Next →

No leaderboard results yet.