SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 311320 of 655 papers

TitleStatusHype
Online Learning of Energy Consumption for Navigation of Electric Vehicles0
Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling0
Variational Bayesian Optimistic Sampling0
Differentially Private Federated Bayesian Optimization with Distributed Exploration0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes0
Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized RecommendationsCode0
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning0
Batched Thompson Sampling0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Show:102550
← PrevPage 32 of 66Next →

No leaderboard results yet.