SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 526550 of 655 papers

TitleStatusHype
Remote Contextual Bandits0
Residual Bootstrap Exploration for Bandit Algorithms0
Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning0
Reward Biased Maximum Likelihood Estimation for Reinforcement Learning0
Risk and optimal policies in bandit experiments0
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs0
Risk-Constrained Thompson Sampling for CVaR Bandits0
Robust Dynamic Assortment Optimization in the Presence of Outlier Customers0
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments0
Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks0
Safe Linear Leveling Bandits0
Safe Linear Thompson Sampling with Side Information0
Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Sampling Acquisition Functions for Batch Bayesian Optimization0
Satisficing in Time-Sensitive Bandit Learning0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Scalable Generalized Linear Bandits: Online Computation and Hashing0
Scalable Neural Contextual Bandit for Recommender Systems0
Scalable regret for learning to control network-coupled subsystems with unknown dynamics0
Scalable Thompson Sampling using Sparse Gaussian Process Models0
Scalable Thompson Sampling via Optimal Transport0
Scaling Multi-Armed Bandit Algorithms0
Screening for an Infectious Disease as a Problem in Stochastic Control0
Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization0
Show:102550
← PrevPage 22 of 27Next →

No leaderboard results yet.