SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 401410 of 655 papers

TitleStatusHype
Causal Bandits without prior knowledge using separating sets0
Thompson Sampling for Unsupervised Sequential Selection0
A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits0
Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling0
Contextual Bandits for Advertising Budget Allocation0
Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses0
Reinforcement Learning with Trajectory Feedback0
Lenient Regret for Multi-Armed Bandits0
IntelligentPooling: Practical Thompson Sampling for mHealth0
Greedy Bandits with Sampled Context0
Show:102550
← PrevPage 41 of 66Next →

No leaderboard results yet.