SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 7180 of 655 papers

TitleStatusHype
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Epinet for Content Cold Start0
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem0
Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program0
BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration0
Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks0
Distributed Thompson sampling under constrained communicationCode0
Aligning AI Agents via Information-Directed Sampling0
Queueing Matching Bandits with Preference FeedbackCode0
Show:102550
← PrevPage 8 of 66Next →

No leaderboard results yet.