SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 151160 of 655 papers

TitleStatusHype
Bandits Under The Influence (Extended Version)0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility0
Bandit Models of Human Behavior: Reward Processing in Mental Disorders0
Analysis of Thompson Sampling for Graphical Bandits Without the Graphs0
Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits0
A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
Bandit Learning for Diversified Interactive Recommendation0
Adaptive Rate of Convergence of Thompson Sampling for Gaussian Process Optimization0
Show:102550
← PrevPage 16 of 66Next →

No leaderboard results yet.