SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 581590 of 655 papers

TitleStatusHype
Calibrated Fairness in Bandits0
A Practical Method for Solving Contextual Bandit Problems Using Decision Trees0
Bandit Models of Human Behavior: Reward Processing in Mental Disorders0
Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space0
Thompson Sampling for the MNL-Bandit0
Scalable Generalized Linear Bandits: Online Computation and Hashing0
Asynchronous Parallel Bayesian Optimisation via Thompson SamplingCode0
A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data0
AIXIjs: A Software Demo for General Reinforcement LearningCode0
Ensemble Sampling0
Show:102550
← PrevPage 59 of 66Next →

No leaderboard results yet.