SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 611620 of 655 papers

TitleStatusHype
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control0
Bandit Convex Optimization: sqrtT Regret in One Dimension0
Bandit Learning for Diversified Interactive Recommendation0
Bandit Models of Human Behavior: Reward Processing in Mental Disorders0
Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility0
Bandits Under The Influence (Extended Version)0
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization0
Batch Bayesian Optimization for Replicable Experimental Design0
Batched Thompson Sampling0
Batched Thompson Sampling for Multi-Armed Bandits0
Show:102550
← PrevPage 62 of 66Next →

No leaderboard results yet.