SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 541550 of 655 papers

TitleStatusHype
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
New Insights into Bootstrapping for Bandits0
Analysis of Thompson Sampling for Graphical Bandits Without the Graphs0
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits0
Profitable Bandits0
Thompson Sampling for Combinatorial Semi-Bandits0
Active Reinforcement Learning with Monte-Carlo Tree Search0
Satisficing in Time-Sensitive Bandit Learning0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Show:102550
← PrevPage 55 of 66Next →

No leaderboard results yet.