SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 561570 of 655 papers

TitleStatusHype
Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates0
Information Directed Sampling for Stochastic Bandits with Graph Feedback0
The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems0
Generalized Probabilistic Bisection for Stochastic Root-Finding0
Minimal Exploration in Structured Stochastic Bandits0
Sequential Matrix Completion0
A study of Thompson Sampling with Parameter h0
Learning Unknown Markov Decision Processes: A Thompson Sampling Approach0
Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits0
Variational inference for the multi-armed contextual banditCode0
Show:102550
← PrevPage 57 of 66Next →

No leaderboard results yet.