SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 5160 of 655 papers

TitleStatusHype
Optimizing Posterior Samples for Bayesian Optimization via RootfindingCode1
Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program0
BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration0
Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks0
Distributed Thompson sampling under constrained communicationCode0
Aligning AI Agents via Information-Directed Sampling0
Queueing Matching Bandits with Preference FeedbackCode0
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing0
Gaussian Process Thompson Sampling via Rootfinding0
Contextual Bandits with Non-Stationary Correlated Rewards for User Association in MmWave Vehicular Networks0
Show:102550
← PrevPage 6 of 66Next →

No leaderboard results yet.