SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 301350 of 655 papers

TitleStatusHype
Weak Signal Asymptotics for Sequentially Randomized Experiments0
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation0
Discounted Thompson Sampling for Non-Stationary Bandit Problems0
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning0
Distributed Thompson Sampling0
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
Double-Linear Thompson Sampling for Context-Attentive Bandits0
Double Thompson Sampling in Finite stochastic Games0
Online Multi-Armed Bandits with Adaptive Inference0
Doubly robust Thompson sampling for linear payoffs0
Doubly Robust Thompson Sampling with Linear Payoffs0
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN0
Dual-Directed Algorithm Design for Efficient Pure Exploration0
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation0
Dynamic Decision-Making under Model Misspecification0
Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization0
Efficient and Adaptive Posterior Sampling Algorithms for Bandits0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Efficient Exploration for LLMs0
Efficient exploration of zero-sum stochastic games0
Efficient exploration with Double Uncertain Value Networks0
Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling0
Efficient kernelized bandit algorithms via exploration distributions0
Efficient Learning in Large-Scale Combinatorial Semi-Bandits0
Efficient Linear Bandits through Matrix Sketching0
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling0
Efficient Multivariate Bandit Algorithm with Path Planning0
Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling0
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation0
Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates0
Eluder Dimension and the Sample Complexity of Optimistic Exploration0
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment0
Ensemble Sampling0
Epinet for Content Cold Start0
Epsilon-Greedy Thompson Sampling to Bayesian Optimization0
Estimating prediction error for complex samples0
Estimating Quality in Multi-Objective Bandits Optimization0
Etat de l'art sur l'application des bandits multi-bras0
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
From Predictions to Decisions: The Importance of Joint Predictive Distributions0
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems0
Expected Improvement-based Contextual Bandits0
Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization0
A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting0
Exploration for Multi-task Reinforcement Learning with Deep Generative Models0
Exploration via linearly perturbed loss minimisation0
Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network0
Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization0
Show:102550
← PrevPage 7 of 14Next →

No leaderboard results yet.