SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 176200 of 655 papers

TitleStatusHype
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space0
Incentivizing Exploration with Linear Contexts and Combinatorial Actions0
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive AdvantagesCode0
Combinatorial Neural Bandits0
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloCode1
Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation0
Discounted Thompson Sampling for Non-Stationary Bandit Problems0
Sequential Best-Arm Identification with Application to Brain-Computer Interface0
Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions0
An improved regret analysis for UCB-N and TS-N0
Trajectory-oriented optimization of stochastic epidemiological modelsCode0
Neural Exploitation and Exploration of Contextual BanditsCode1
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards0
Efficiently Tackling Million-Dimensional Multiobjective Problems: A Direction Sampling and Fine-Tuning Approach0
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms0
GUTS: Generalized Uncertainty-Aware Thompson Sampling for Multi-Agent Active Search0
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches0
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning0
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms0
Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
When Combinatorial Thompson Sampling meets Approximation Regret0
Show:102550
← PrevPage 8 of 27Next →

No leaderboard results yet.