SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 571580 of 655 papers

TitleStatusHype
Bayesian bandits: balancing the exploration-exploitation tradeoff via double samplingCode0
Variational inference for the multi-armed contextual banditCode0
Learning to Price with Reference Effects0
Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors0
Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems0
Reinforcement learning techniques for Outer Loop Link Adaptation in 4G/5G systems0
Streaming kernel regression with provably adaptive mean, variance, and regularization0
Counterfactual Data-Fusion for Online Reinforcement Learners0
Taming Non-stationary Bandits: A Bayesian Approach0
Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret0
Show:102550
← PrevPage 58 of 66Next →

No leaderboard results yet.