SOTAVerified|Agents Browse Leaderboard About

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 621–630 of 655 papers

Title	Date	Tasks	Status
Thompson Sampling is Asymptotically Optimal in General Environments	Feb 25, 2016	reinforcement-learningReinforcement Learning	—Unverified
Convolutional Monte Carlo Rollouts in Go	Dec 10, 2015	GPUThompson Sampling	—Unverified
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation	Dec 1, 2015	Collaborative FilteringRecommendation Systems	—Unverified
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits	Nov 18, 2015	Multi-Armed BanditsThompson Sampling	—Unverified
TSEB: More Efficient Thompson Sampling for Policy Learning	Oct 10, 2015	Thompson Sampling	—Unverified
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models	Jul 3, 2015	Atari Gamesreinforcement-learning	CodeCode Available
Bootstrapped Thompson Sampling and Deep Exploration	Jul 1, 2015	reinforcement-learningReinforcement Learning	—Unverified
On the Prior Sensitivity of Thompson Sampling	Jun 10, 2015	SensitivityThompson Sampling	—Unverified
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays	Jun 2, 2015	Thompson Sampling	CodeCode Available
Belief Flows of Robust Online Learning	May 26, 2015	General Classificationregression	—Unverified

Show:10 25 50

← PrevPage 63 of 66Next →

No leaderboard results yet.