SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 381390 of 655 papers

TitleStatusHype
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Incentivizing Combinatorial Bandit Exploration0
Incentivizing Exploration with Linear Contexts and Combinatorial Actions0
Incorporating Behavioral Constraints in Online AI Systems0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems0
Influencing Bandits: Arm Selection for Preference Shaping0
Show:102550
← PrevPage 39 of 66Next →

No leaderboard results yet.