SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 461470 of 655 papers

TitleStatusHype
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Safe Linear Thompson Sampling with Side Information0
On Batch Bayesian Optimization0
On Online Learning in Kernelized Markov Decision Processes0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling via Local UncertaintyCode0
Fixed-Confidence Guarantees for Bayesian Best-Arm Identification0
Thompson Sampling in Non-Episodic Restless Bandits0
Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsCode0
Regret Analysis of Bandit Problems with Causal Background Knowledge0
Show:102550
← PrevPage 47 of 66Next →

No leaderboard results yet.