SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 301325 of 655 papers

TitleStatusHype
Improving sample efficiency of high dimensional Bayesian optimization with MCMC0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Incentivizing Combinatorial Bandit Exploration0
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff0
Incentivizing Exploration with Linear Contexts and Combinatorial Actions0
Incorporating Behavioral Constraints in Online AI Systems0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
Influencing Bandits: Arm Selection for Preference Shaping0
Combinatorial Neural Bandits0
Information Directed Sampling and Bandits with Heteroscedastic Noise0
Information Directed Sampling for Stochastic Bandits with Graph Feedback0
Information-Theoretic Confidence Bounds for Reinforcement Learning0
IntelligentPooling: Practical Thompson Sampling for mHealth0
Joint User Association and Pairing in Multi-UAV-Assisted NOMA Networks: A Decaying-Epsilon Thompson Sampling Framework0
Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network0
KLUCB Approach to Copeland Bandits0
Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits0
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio0
Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search0
An improved regret analysis for UCB-N and TS-N0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Show:102550
← PrevPage 13 of 27Next →

No leaderboard results yet.