SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 301310 of 655 papers

TitleStatusHype
Improving sample efficiency of high dimensional Bayesian optimization with MCMC0
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments0
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
Incentivizing Combinatorial Bandit Exploration0
Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems0
Incentivizing Exploration with Linear Contexts and Combinatorial Actions0
Incorporating Behavioral Constraints in Online AI Systems0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
Show:102550
← PrevPage 31 of 66Next →

No leaderboard results yet.