SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 321330 of 655 papers

TitleStatusHype
Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
Expected Improvement-based Contextual Bandits0
Deep Exploration for Recommendation Systems0
Vaccine allocation policy optimization and budget sharing mechanism using Thompson samplingCode0
Online Learning of Network Bottlenecks via Minimax Paths0
Machine Learning for Online Algorithm Selection under Censored FeedbackCode0
Thompson Sampling for Bandits with Clustered Arms0
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems0
Show:102550
← PrevPage 33 of 66Next →

No leaderboard results yet.