SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 371380 of 655 papers

TitleStatusHype
High-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling0
Horde of Bandits using Gaussian Markov Random Fields0
Human collective intelligence as distributed Bayesian inference0
Hypermodels for Exploration0
IBAC: An Intelligent Dynamic Bandwidth Channel Access Avoiding Outside Warning Range Problem0
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning0
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems0
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration0
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
Improving sample efficiency of high dimensional Bayesian optimization with MCMC0
Show:102550
← PrevPage 38 of 66Next →

No leaderboard results yet.