SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 451475 of 655 papers

TitleStatusHype
Making Sense of Reinforcement Learning and Probabilistic Inference0
Randomized Exploration for Non-Stationary Stochastic Linear BanditsCode0
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling0
Ordinal Bayesian Optimisation0
Thompson Sampling and Approximate Inference0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
Automatic Ensemble Learning for Online Influence Maximization0
Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood StructuresCode0
Information-Theoretic Confidence Bounds for Reinforcement Learning0
Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Safe Linear Thompson Sampling with Side Information0
On Online Learning in Kernelized Markov Decision Processes0
On Batch Bayesian Optimization0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling via Local UncertaintyCode0
Fixed-Confidence Guarantees for Bayesian Best-Arm Identification0
Thompson Sampling in Non-Episodic Restless Bandits0
Regret Analysis of Bandit Problems with Causal Background Knowledge0
Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsCode0
Robust Dynamic Assortment Optimization in the Presence of Outlier Customers0
A Quantile-based Approach for Hyperparameter Transfer Learning0
A Copula approach for hyperparameter transfer learning0
Efficient Multivariate Bandit Algorithm with Path Planning0
Show:102550
← PrevPage 19 of 27Next →

No leaderboard results yet.