SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 401450 of 655 papers

TitleStatusHype
Robust Dynamic Assortment Optimization in the Presence of Outlier Customers0
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments0
Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks0
Safe Linear Leveling Bandits0
Safe Linear Thompson Sampling with Side Information0
Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Sampling Acquisition Functions for Batch Bayesian Optimization0
Satisficing in Time-Sensitive Bandit Learning0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Scalable Generalized Linear Bandits: Online Computation and Hashing0
Scalable Neural Contextual Bandit for Recommender Systems0
Scalable regret for learning to control network-coupled subsystems with unknown dynamics0
Scalable Thompson Sampling using Sparse Gaussian Process Models0
Scalable Thompson Sampling via Optimal Transport0
Scaling Multi-Armed Bandit Algorithms0
Screening for an Infectious Disease as a Problem in Stochastic Control0
Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization0
Sequential Best-Arm Identification with Application to Brain-Computer Interface0
Sequential Matrix Completion0
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling0
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms0
Simple Bayesian Algorithms for Best Arm Identification0
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling0
Sliding-Window Thompson Sampling for Non-Stationary Settings0
Smart Routing with Precise Link Estimation: DSEE-Based Anypath Routing for Reliable Wireless Networking0
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling0
Sparse Nonparametric Contextual Bandits0
Sparse Spectrum Gaussian Process for Bayesian Optimization0
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism0
SPRT-based Efficient Best Arm Identification in Stochastic Bandits0
Stable Thompson Sampling: Valid Inference via Variance Inflation0
Stage-wise Conservative Linear Bandits0
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits0
Stochastically Constrained Best Arm Identification with Thompson Sampling0
Stochastic Neural Network with Kronecker Flow0
Streaming kernel regression with provably adaptive mean, variance, and regularization0
Surrogate modeling for Bayesian optimization beyond a single Gaussian process0
Synthetically Controlled Bandits0
Taming Non-stationary Bandits: A Bayesian Approach0
Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits0
Cramming Contextual Bandits for On-policy Statistical Evaluation0
The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems0
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits0
The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle0
The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation0
The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling0
The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies0
The Typical Behavior of Bandit Algorithms0
Thompson Exploration with Best Challenger Rule in Best Arm Identification0
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.