SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 631640 of 655 papers

TitleStatusHype
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling ParadoxCode0
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized RecommendationsCode0
Thompson Sampling for Linearly Constrained BanditsCode0
Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted AveragesCode0
Tsetlin Machine for Solving Contextual Bandit ProblemsCode0
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Bandit Learning with Implicit FeedbackCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Show:102550
← PrevPage 64 of 66Next →

No leaderboard results yet.