SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 441450 of 655 papers

TitleStatusHype
Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits0
Cramming Contextual Bandits for On-policy Statistical Evaluation0
The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems0
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits0
The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle0
The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation0
The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling0
The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies0
The Typical Behavior of Bandit Algorithms0
Thompson Exploration with Best Challenger Rule in Best Arm Identification0
Show:102550
← PrevPage 45 of 66Next →

No leaderboard results yet.