SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 591600 of 655 papers

TitleStatusHype
Posterior sampling for reinforcement learning: worst-case regret bounds0
Adaptive Rate of Convergence of Thompson Sampling for Gaussian Process Optimization0
Context Attentive Bandits: Contextual Bandit with Restricted Context0
Multi-dueling Bandits with Dependent Arms0
Time-Sensitive Bandit Learning and Satisficing Thompson Sampling0
Mostly Exploration-Free Algorithms for Contextual BanditsCode0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Thompson Sampling for Linear-Quadratic Control Problems0
Horde of Bandits using Gaussian Markov Random Fields0
QoS-Aware Multi-Armed Bandits0
Show:102550
← PrevPage 60 of 66Next →

No leaderboard results yet.