SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 6170 of 655 papers

TitleStatusHype
An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization0
An Empirical Evaluation of Thompson Sampling0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
Adjusted Expected Improvement for Cumulative Regret Minimization in Noisy Bayesian Optimization0
Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling0
A Distributed Neural Linear Thompson Sampling Framework to Achieve URLLC in Industrial IoT0
Active Reinforcement Learning with Monte-Carlo Tree Search0
A Bandit Approach to Online Pricing for Heterogeneous Edge Resource Allocation0
A Note on Information-Directed Sampling and Thompson Sampling0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
Show:102550
← PrevPage 7 of 66Next →

No leaderboard results yet.