SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 441450 of 655 papers

TitleStatusHype
An Online Learning Framework for Energy-Efficient Navigation of Electric Vehicles0
MOTS: Minimax Optimal Thompson Sampling0
Efficient exploration of zero-sum stochastic games0
On Thompson Sampling with Langevin Algorithms0
Residual Bootstrap Exploration for Bandit Algorithms0
A General Theory of the Stochastic Linear Bandit and Its Applications0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Thompson Sampling Algorithms for Mean-Variance BanditsCode0
Bayesian Quantile and Expectile Optimisation0
On Thompson Sampling for Smoother-than-Lipschitz Bandits0
Show:102550
← PrevPage 45 of 66Next →

No leaderboard results yet.