SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 181190 of 655 papers

TitleStatusHype
Counterfactual Inference under Thompson Sampling0
Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits0
Cover Tree Bayesian Reinforcement Learning0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Asymptotic Convergence of Thompson Sampling0
Debiasing Samples from Online Learning Using Bootstrap0
Decentralized Multi-Agent Active Search and Tracking when Targets Outnumber Agents0
Deciding What to Learn: A Rate-Distortion Approach0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Bayesian Quantile and Expectile Optimisation0
Show:102550
← PrevPage 19 of 66Next →

No leaderboard results yet.