SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 621630 of 655 papers

TitleStatusHype
Thompson Sampling is Asymptotically Optimal in General Environments0
Convolutional Monte Carlo Rollouts in Go0
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation0
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits0
TSEB: More Efficient Thompson Sampling for Policy Learning0
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
Bootstrapped Thompson Sampling and Deep Exploration0
On the Prior Sensitivity of Thompson Sampling0
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple PlaysCode0
Belief Flows of Robust Online Learning0
Show:102550
← PrevPage 63 of 66Next →

No leaderboard results yet.