SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 626650 of 655 papers

TitleStatusHype
Bayesian Non-stationary Linear Bandits for Large-Scale Recommender SystemsCode0
Bayesian bandits: balancing the exploration-exploitation tradeoff via double samplingCode0
Information-Directed Exploration for Deep Reinforcement LearningCode0
VITS : Variational Inference Thompson Sampling for contextual banditsCode0
Representative Action Selection for Large Action-Space Meta-BanditsCode0
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling ParadoxCode0
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized RecommendationsCode0
Thompson Sampling for Linearly Constrained BanditsCode0
Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted AveragesCode0
Tsetlin Machine for Solving Contextual Bandit ProblemsCode0
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Bandit Learning with Implicit FeedbackCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Thompson Sampling with Information Relaxation PenaltiesCode0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Odds-Ratio Thompson Sampling to Control for Time-Varying EffectCode0
Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsCode0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Trajectory-oriented optimization of stochastic epidemiological modelsCode0
On Bits and Bandits: Quantifying the Regret-Information Trade-offCode0
Learning to Play Imperfect-Information Games by Imitating an Oracle PlannerCode0
Process-constrained batch Bayesian approaches for yield optimization in multi-reactor systemsCode0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Show:102550
← PrevPage 26 of 27Next →

No leaderboard results yet.