SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 371380 of 655 papers

TitleStatusHype
Non-Stationary Latent Bandits0
On Efficiency in Hierarchical Reinforcement Learning0
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning0
Reward Biased Maximum Likelihood Estimation for Reinforcement Learning0
Risk-Constrained Thompson Sampling for CVaR Bandits0
Accelerating Grasp Exploration by Leveraging Learned Priors0
Thompson sampling for linear quadratic mean-field teams0
Multi-Agent Active Search using Realistic Depth-Aware Noise ModelCode0
Asymptotic Convergence of Thompson Sampling0
Adaptive Combinatorial Allocation0
Show:102550
← PrevPage 38 of 66Next →

No leaderboard results yet.