SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 111120 of 655 papers

TitleStatusHype
Bandit Models of Human Behavior: Reward Processing in Mental Disorders0
Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility0
Bandits Under The Influence (Extended Version)0
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization0
Batch Bayesian Optimization for Replicable Experimental Design0
A Note on Information-Directed Sampling and Thompson Sampling0
Batched Thompson Sampling0
Batched Thompson Sampling for Multi-Armed Bandits0
An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits0
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits0
Show:102550
← PrevPage 12 of 66Next →

No leaderboard results yet.