SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 91100 of 655 papers

TitleStatusHype
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
MergeDTS: A Method for Effective Large-Scale Online Ranker EvaluationCode0
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Asynchronous ε-Greedy Bayesian OptimisationCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Bandit Learning with Implicit FeedbackCode0
Atlas: Automate Online Service Configuration in Network SlicingCode0
Adaptive Interventions with User-Defined Goals for Health Behavior ChangeCode0
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Show:102550
← PrevPage 10 of 66Next →

No leaderboard results yet.