SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 251260 of 655 papers

TitleStatusHype
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs0
Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes0
Thompson Sampling for (Combinatorial) Pure Exploration0
Thompson Sampling Achieves O(T) Regret in Linear Quadratic Control0
Thompson Sampling for Robust Transfer in Multi-Task BanditsCode0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
On Provably Robust Meta-Bayesian OptimizationCode0
Top Two Algorithms Revisited0
Regret Bounds for Information-Directed Reinforcement Learning0
A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits0
Show:102550
← PrevPage 26 of 66Next →

No leaderboard results yet.