SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 301310 of 655 papers

TitleStatusHype
Variational Bayesian Optimistic Sampling0
Differentially Private Federated Bayesian Optimization with Distributed Exploration0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes0
Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized RecommendationsCode0
EE-Net: Exploitation-Exploration Neural Networks in Contextual BanditsCode1
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning0
Batched Thompson Sampling0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles0
Show:102550
← PrevPage 31 of 66Next →

No leaderboard results yet.