SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 651655 of 655 papers

TitleStatusHype
Thompson Sampling for Contextual Bandits with Linear PayoffsCode0
Anytime Multi-Agent Path Finding with an Adaptive Delay-Based HeuristicCode0
AIXIjs: A Software Demo for General Reinforcement LearningCode0
Thompson Sampling Algorithms for Mean-Variance BanditsCode0
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
Show:102550
← PrevPage 14 of 14Next →

No leaderboard results yet.