SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 591600 of 655 papers

TitleStatusHype
Thompson Sampling for High-Dimensional Sparse Linear Contextual BanditsCode0
Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental HealthCode0
Sub-sampling for Efficient Non-Parametric Bandit ExplorationCode0
Information-Directed Selection for Top-Two AlgorithmsCode0
Thompson Sampling for a Fatigue-aware Online Recommendation SystemCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Regret Bounds for Thompson Sampling in Episodic Restless Bandit ProblemsCode0
More Efficient Randomized Exploration for Reinforcement Learning via Approximate SamplingCode0
Mostly Exploration-Free Algorithms for Contextual BanditsCode0
Show:102550
← PrevPage 60 of 66Next →

No leaderboard results yet.