SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 201250 of 655 papers

TitleStatusHype
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation0
Discounted Thompson Sampling for Non-Stationary Bandit Problems0
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning0
Distributed Thompson Sampling0
Adaptive Combinatorial Allocation0
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
Double-Linear Thompson Sampling for Context-Attentive Bandits0
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning0
Bag of Policies for Distributional Deep Exploration0
Double Thompson Sampling in Finite stochastic Games0
Online Multi-Armed Bandits with Adaptive Inference0
Doubly robust Thompson sampling for linear payoffs0
Doubly Robust Thompson Sampling with Linear Payoffs0
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN0
Dual-Directed Algorithm Design for Efficient Pure Exploration0
Bandit Convex Optimization: sqrtT Regret in One Dimension0
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation0
Dynamic Decision-Making under Model Misspecification0
Bayesian Quantile and Expectile Optimisation0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization0
Efficient and Adaptive Posterior Sampling Algorithms for Bandits0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Efficient Exploration for LLMs0
Efficient exploration of zero-sum stochastic games0
Bandits Under The Influence (Extended Version)0
Efficient exploration with Double Uncertain Value Networks0
Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling0
Efficient kernelized bandit algorithms via exploration distributions0
Efficient Learning in Large-Scale Combinatorial Semi-Bandits0
Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce0
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling0
Efficient Multivariate Bandit Algorithm with Path Planning0
Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling0
Batched Thompson Sampling for Multi-Armed Bandits0
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation0
Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates0
Eluder Dimension and the Sample Complexity of Optimistic Exploration0
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment0
Ensemble Sampling0
Epinet for Content Cold Start0
Epsilon-Greedy Thompson Sampling to Bayesian Optimization0
Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies0
Estimating prediction error for complex samples0
A Copula approach for hyperparameter transfer learning0
Etat de l'art sur l'application des bandits multi-bras0
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation0
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.