SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 201250 of 655 papers

TitleStatusHype
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning0
Debiasing Samples from Online Learning Using Bootstrap0
Asymptotic Convergence of Thompson Sampling0
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
Double-Linear Thompson Sampling for Context-Attentive Bandits0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Cover Tree Bayesian Reinforcement Learning0
Double Thompson Sampling in Finite stochastic Games0
Online Multi-Armed Bandits with Adaptive Inference0
Doubly robust Thompson sampling for linear payoffs0
Doubly Robust Thompson Sampling with Linear Payoffs0
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN0
Dual-Directed Algorithm Design for Efficient Pure Exploration0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation0
Dynamic Decision-Making under Model Misspecification0
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms0
Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits0
Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization0
Efficient and Adaptive Posterior Sampling Algorithms for Bandits0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Efficient Exploration for LLMs0
Efficient exploration of zero-sum stochastic games0
Counterfactual Inference under Thompson Sampling0
Efficient exploration with Double Uncertain Value Networks0
Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling0
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
Efficient Learning in Large-Scale Combinatorial Semi-Bandits0
Counterfactual Data-Fusion for Online Reinforcement Learners0
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling0
Efficient Multivariate Bandit Algorithm with Path Planning0
Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling0
Asymptotically Optimal Bandits under Weighted Information0
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation0
A General Theory of the Stochastic Linear Bandit and Its Applications0
Eluder Dimension and the Sample Complexity of Optimistic Exploration0
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment0
Ensemble Sampling0
Cost-efficient Knowledge-based Question Answering with Large Language Models0
Epsilon-Greedy Thompson Sampling to Bayesian Optimization0
Cost Aware Asynchronous Multi-Agent Active Search0
Estimating prediction error for complex samples0
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits0
Etat de l'art sur l'application des bandits multi-bras0
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
Convolutional Monte Carlo Rollouts in Go0
Show:102550
← PrevPage 5 of 14Next →

No leaderboard results yet.