SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 301310 of 655 papers

TitleStatusHype
Safe Linear Leveling Bandits0
Risk and optimal policies in bandit experiments0
Optimizing Conditional Value-At-Risk of Black-Box FunctionsCode0
Doubly Robust Thompson Sampling with Linear Payoffs0
Observation-Free Attacks on Stochastic Bandits0
Adaptive Gating for Single-Photon 3D Imaging0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Hierarchical Bayesian Bandits0
The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle0
Maillard Sampling: Boltzmann Exploration Done Optimally0
Show:102550
← PrevPage 31 of 66Next →

No leaderboard results yet.