SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 341350 of 655 papers

TitleStatusHype
Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization0
A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting0
Exploration for Multi-task Reinforcement Learning with Deep Generative Models0
Exploration via linearly perturbed loss minimisation0
Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network0
Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization0
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning0
Feel-Good Thompson Sampling for Contextual Dueling Bandits0
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits0
First-Order Bayesian Regret Analysis of Thompson Sampling0
Show:102550
← PrevPage 35 of 66Next →

No leaderboard results yet.