SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 381390 of 655 papers

TitleStatusHype
Reward Biased Maximum Likelihood Estimation for Reinforcement Learning0
Accelerating Grasp Exploration by Leveraging Learned Priors0
Multi-Agent Active Search using Realistic Depth-Aware Noise ModelCode0
Thompson sampling for linear quadratic mean-field teams0
Asymptotic Convergence of Thompson Sampling0
Adaptive Combinatorial Allocation0
Greedy k-Center from Noisy Distance Samples0
Multi-armed Bandits with Cost Subsidy0
Screening for an Infectious Disease as a Problem in Stochastic Control0
Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility0
Show:102550
← PrevPage 39 of 66Next →

No leaderboard results yet.