SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 201225 of 655 papers

TitleStatusHype
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning0
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms0
Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits0
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
Double-Linear Thompson Sampling for Context-Attentive Bandits0
Counterfactual Inference under Thompson Sampling0
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
Double Thompson Sampling in Finite stochastic Games0
Online Multi-Armed Bandits with Adaptive Inference0
Doubly robust Thompson sampling for linear payoffs0
Doubly Robust Thompson Sampling with Linear Payoffs0
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN0
Dual-Directed Algorithm Design for Efficient Pure Exploration0
Counterfactual Data-Fusion for Online Reinforcement Learners0
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation0
Dynamic Decision-Making under Model Misspecification0
Asymptotically Optimal Bandits under Weighted Information0
A General Theory of the Stochastic Linear Bandit and Its Applications0
Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization0
Efficient and Adaptive Posterior Sampling Algorithms for Bandits0
Cost-efficient Knowledge-based Question Answering with Large Language Models0
Show:102550
← PrevPage 9 of 27Next →

No leaderboard results yet.