SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 626650 of 655 papers

TitleStatusHype
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
Bootstrapped Thompson Sampling and Deep Exploration0
On the Prior Sensitivity of Thompson Sampling0
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple PlaysCode0
Belief Flows of Robust Online Learning0
Thompson Sampling for Budgeted Multi-armed Bandits0
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems0
A Note on Information-Directed Sampling and Thompson Sampling0
Bandit Convex Optimization: sqrtT Regret in One Dimension0
Thompson sampling with the online bootstrap0
Freshness-Aware Thompson Sampling0
Towards Optimal Algorithms for Prediction with Expert Advice0
Thompson Sampling for Learning Parameterized Markov Decision Processes0
Efficient Learning in Large-Scale Combinatorial Semi-Bandits0
An Information-Theoretic Analysis of Thompson Sampling0
Better Optimism By Bayes: Adaptive Planning with Rich Models0
Eluder Dimension and the Sample Complexity of Optimistic Exploration0
Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search0
Thompson Sampling for Complex Bandit Problems0
Thompson Sampling for Online Learning with Linear Experts0
Generalized Thompson Sampling for Contextual Bandits0
Thompson Sampling in Dynamic Systems for Contextual Bandit Problems0
Thompson Sampling for 1-Dimensional Exponential Family Bandits0
Cover Tree Bayesian Reinforcement Learning0
Prior-free and prior-dependent regret bounds for Thompson Sampling0
Show:102550
← PrevPage 26 of 27Next →

No leaderboard results yet.