SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 511520 of 655 papers

TitleStatusHype
KLUCB Approach to Copeland Bandits0
First-Order Bayesian Regret Analysis of Thompson Sampling0
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model0
Thompson Sampling for a Fatigue-aware Online Recommendation SystemCode0
Parallel Contextual Bandits in Wireless Handover Optimization0
Information-Directed Exploration for Deep Reinforcement LearningCode0
MergeDTS: A Method for Effective Large-Scale Online Ranker EvaluationCode0
Thompson Sampling for Noncompliant Bandits0
Bandit Learning with Implicit FeedbackCode0
Optimal Learning for Dynamic Coding in Deadline-Constrained Multi-Channel Networks0
Show:102550
← PrevPage 52 of 66Next →

No leaderboard results yet.