SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 401410 of 655 papers

TitleStatusHype
Robust Dynamic Assortment Optimization in the Presence of Outlier Customers0
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments0
Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks0
Safe Linear Leveling Bandits0
Safe Linear Thompson Sampling with Side Information0
Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Sampling Acquisition Functions for Batch Bayesian Optimization0
Satisficing in Time-Sensitive Bandit Learning0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Show:102550
← PrevPage 41 of 66Next →

No leaderboard results yet.