SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 376400 of 655 papers

TitleStatusHype
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning0
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems0
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration0
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
Improving sample efficiency of high dimensional Bayesian optimization with MCMC0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Incentivizing Combinatorial Bandit Exploration0
Incentivizing Exploration with Linear Contexts and Combinatorial Actions0
Incorporating Behavioral Constraints in Online AI Systems0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems0
Influencing Bandits: Arm Selection for Preference Shaping0
Information Directed Sampling and Bandits with Heteroscedastic Noise0
Information Directed Sampling for Stochastic Bandits with Graph Feedback0
Information-Theoretic Confidence Bounds for Reinforcement Learning0
IntelligentPooling: Practical Thompson Sampling for mHealth0
Joint User Association and Pairing in Multi-UAV-Assisted NOMA Networks: A Decaying-Epsilon Thompson Sampling Framework0
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems0
KLUCB Approach to Copeland Bandits0
Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Latent Bandits Revisited0
Show:102550
← PrevPage 16 of 27Next →

No leaderboard results yet.