SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 301325 of 655 papers

TitleStatusHype
Approximate information for efficient exploration-exploitation strategies0
Fast Change Identification in Multi-Play Bandits and its Applications in Wireless Networks0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Improving sample efficiency of high dimensional Bayesian optimization with MCMC0
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments0
Incentivizing Exploration with Linear Contexts and Combinatorial Actions0
Incorporating Behavioral Constraints in Online AI Systems0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
Influencing Bandits: Arm Selection for Preference Shaping0
Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems0
Information Directed Sampling and Bandits with Heteroscedastic Noise0
Information Directed Sampling for Stochastic Bandits with Graph Feedback0
Information-Theoretic Confidence Bounds for Reinforcement Learning0
IntelligentPooling: Practical Thompson Sampling for mHealth0
Joint User Association and Pairing in Multi-UAV-Assisted NOMA Networks: A Decaying-Epsilon Thompson Sampling Framework0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
KLUCB Approach to Copeland Bandits0
Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits0
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration0
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems0
Causal Bandits without prior knowledge using separating sets0
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning0
Show:102550
← PrevPage 13 of 27Next →

No leaderboard results yet.