SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 341350 of 655 papers

TitleStatusHype
On Online Learning in Kernelized Markov Decision Processes0
On The Differential Privacy of Thompson Sampling With Gaussian Prior0
On the Importance of Uncertainty in Decision-Making with Large Language Models0
On the Performance of Thompson Sampling on Logistic Bandits0
On the Prior Sensitivity of Thompson Sampling0
On Thompson Sampling for Smoother-than-Lipschitz Bandits0
On Thompson Sampling with Langevin Algorithms0
On Frequentist Regret of Linear Thompson Sampling0
Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment0
Optimal Exploration is no harder than Thompson Sampling0
Show:102550
← PrevPage 35 of 66Next →

No leaderboard results yet.