SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 426450 of 655 papers

TitleStatusHype
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling0
Concurrent Decentralized Channel Allocation and Access Point Selection using Multi-Armed Bandits in multi BSS WLANs0
Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-Start UsersCode1
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints0
Learning to Rank in the Position Based Model with Bandit Feedback0
Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization0
Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D0
Thompson Sampling for Linearly Constrained BanditsCode0
Optimal No-regret Learning in Repeated First-price Auctions0
A Reliability-aware Multi-armed Bandit Approach to Learn and Select Users in Demand Response0
Delay-Adaptive Learning in Generalized Linear Contextual Bandits0
Online Residential Demand Response via Contextual Multi-Armed Bandits0
Odds-Ratio Thompson Sampling to Control for Time-Varying EffectCode0
MOTS: Minimax Optimal Thompson Sampling0
An Online Learning Framework for Energy-Efficient Navigation of Electric Vehicles0
On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial AttacksCode1
Efficient exploration of zero-sum stochastic games0
On Thompson Sampling with Langevin Algorithms0
Residual Bootstrap Exploration for Bandit Algorithms0
A General Theory of the Stochastic Linear Bandit and Its Applications0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Thompson Sampling Algorithms for Mean-Variance BanditsCode0
Bayesian Quantile and Expectile Optimisation0
On Thompson Sampling for Smoother-than-Lipschitz Bandits0
Making Sense of Reinforcement Learning and Probabilistic Inference0
Show:102550
← PrevPage 18 of 27Next →

No leaderboard results yet.