SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 601650 of 655 papers

TitleStatusHype
Stacked Thompson BanditsCode0
Thompson Sampling For Stochastic Bandits with Graph Feedback0
Estimating Quality in Multi-Objective Bandits Optimization0
Exploration for Multi-task Reinforcement Learning with Deep Generative Models0
Nonparametric General Reinforcement Learning0
Linear Thompson Sampling Revisited0
Unimodal Thompson Sampling for Graph-Structured Arms0
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits0
A Formal Solution to the Grain of Truth Problem0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Human collective intelligence as distributed Bayesian inference0
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits0
Online Algorithms For Parameter Mean And Variance Estimation In Dynamic Regression Models0
Linear Bandit algorithms using the Bootstrap0
Double Thompson Sampling for Dueling BanditsCode0
An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization0
A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization0
Optimal Recommendation to Users that React: Online Learning for a Class of POMDPs0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Simple Bayesian Algorithms for Best Arm Identification0
Thompson Sampling is Asymptotically Optimal in General Environments0
Convolutional Monte Carlo Rollouts in Go0
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation0
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits0
TSEB: More Efficient Thompson Sampling for Policy Learning0
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
Bootstrapped Thompson Sampling and Deep Exploration0
On the Prior Sensitivity of Thompson Sampling0
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple PlaysCode0
Belief Flows of Robust Online Learning0
Thompson Sampling for Budgeted Multi-armed Bandits0
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems0
A Note on Information-Directed Sampling and Thompson Sampling0
Bandit Convex Optimization: sqrtT Regret in One Dimension0
Thompson sampling with the online bootstrap0
Freshness-Aware Thompson Sampling0
Towards Optimal Algorithms for Prediction with Expert Advice0
Thompson Sampling for Learning Parameterized Markov Decision Processes0
Efficient Learning in Large-Scale Combinatorial Semi-Bandits0
An Information-Theoretic Analysis of Thompson Sampling0
Better Optimism By Bayes: Adaptive Planning with Rich Models0
Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search0
Eluder Dimension and the Sample Complexity of Optimistic Exploration0
Thompson Sampling for Complex Bandit Problems0
Thompson Sampling for Online Learning with Linear Experts0
Generalized Thompson Sampling for Contextual Bandits0
Thompson Sampling in Dynamic Systems for Contextual Bandit Problems0
Thompson Sampling for 1-Dimensional Exponential Family Bandits0
Cover Tree Bayesian Reinforcement Learning0
Prior-free and prior-dependent regret bounds for Thompson Sampling0
Show:102550
← PrevPage 13 of 14Next →

No leaderboard results yet.