SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 181190 of 655 papers

TitleStatusHype
A Batched Multi-Armed Bandit Approach to News Headline Testing0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Deciding What to Learn: A Rate-Distortion Approach0
Decentralized Multi-Agent Active Search and Tracking when Targets Outnumber Agents0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Debiasing Samples from Online Learning Using Bootstrap0
Asymptotic Convergence of Thompson Sampling0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Cover Tree Bayesian Reinforcement Learning0
Show:102550
← PrevPage 19 of 66Next →

No leaderboard results yet.