SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 571580 of 655 papers

TitleStatusHype
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse HypergraphsCode0
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson SamplingCode0
Adaptive Interventions with User-Defined Goals for Health Behavior ChangeCode0
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
MergeDTS: A Method for Effective Large-Scale Online Ranker EvaluationCode0
Queueing Matching Bandits with Preference FeedbackCode0
Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian ProcessesCode0
On Provably Robust Meta-Bayesian OptimizationCode0
Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood StructuresCode0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Show:102550
← PrevPage 58 of 66Next →

No leaderboard results yet.