SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 8190 of 655 papers

TitleStatusHype
A Reinforcement Learning based Reset Policy for CDCL SAT Solvers0
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems0
A Reliability-aware Multi-armed Bandit Approach to Learn and Select Users in Demand Response0
A resource-constrained stochastic scheduling algorithm for homeless street outreach and gleaning edible food0
A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization0
A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits0
A study of Thompson Sampling with Parameter h0
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits0
Asymptotically Optimal Bandits under Weighted Information0
An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization0
Show:102550
← PrevPage 9 of 66Next →

No leaderboard results yet.