SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 551560 of 655 papers

TitleStatusHype
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Causal Bandits for Linear Structural Equation ModelsCode0
Thompson Sampling: An Asymptotically Optimal Finite Time AnalysisCode0
Scalable Exploration via Ensemble++Code0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy CriticsCode0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Machine Learning for Online Algorithm Selection under Censored FeedbackCode0
Stacked Thompson BanditsCode0
Show:102550
← PrevPage 56 of 66Next →

No leaderboard results yet.