SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 481490 of 655 papers

TitleStatusHype
Scaling Multi-Armed Bandit Algorithms0
Convergence Rates of Posterior Distributions in Markov Decision Process0
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop PlanningCode0
Thompson Sampling on Symmetric α-Stable Bandits0
Thompson Sampling for Combinatorial Network Optimization in Unknown Environments0
Mixed-Variable Bayesian Optimization0
Bandit Learning for Diversified Interactive Recommendation0
Thompson Sampling for Adversarial Bit Prediction0
Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning0
Sparse Spectrum Gaussian Process for Bayesian Optimization0
Show:102550
← PrevPage 49 of 66Next →

No leaderboard results yet.