SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 251300 of 655 papers

TitleStatusHype
Mixed-Effect Thompson SamplingCode0
Surrogate modeling for Bayesian optimization beyond a single Gaussian process0
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits0
Information-Directed Selection for Top-Two AlgorithmsCode0
Fast Change Identification in Multi-Play Bandits and its Applications in Wireless Networks0
Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization0
Adjusted Expected Improvement for Cumulative Regret Minimization in Noisy Bayesian Optimization0
Non-Stationary Bandit Learning via Predictive Sampling0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
Thompson Sampling for Bandit Learning in Matching MarketsCode0
On Kernelized Multi-Armed Bandits with Constraints0
Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic maskingCode0
Thompson Sampling on Asymmetric α-Stable Bandits0
Regenerative Particle Thompson Sampling0
Multi-Agent Active Search using Detection and Location Uncertainty0
Partial Likelihood Thompson Sampling0
An Analysis of Ensemble Sampling0
Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian ProcessesCode0
Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework0
Thompson Sampling with Unrestricted Delays0
Double Thompson Sampling in Finite stochastic Games0
Adaptive Experimentation in the Presence of Exogenous Nonstationary Variation0
Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network0
Synthetically Controlled Bandits0
Remote Contextual Bandits0
Fourier Representations for Black-Box Optimization over Categorical Variables0
On learning Whittle index policy for restless bandits with scalable regret0
Bayesian Non-stationary Linear Bandits for Large-Scale Recommender SystemsCode0
Tsetlin Machine for Solving Contextual Bandit ProblemsCode0
Deep Hierarchy in Bandits0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems0
IBAC: An Intelligent Dynamic Bandwidth Channel Access Avoiding Outside Warning Range Problem0
On Dynamic Pricing with Covariates0
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization0
Safe Linear Leveling Bandits0
Risk and optimal policies in bandit experiments0
Bayesian Optimization over Permutation SpacesCode1
Observation-Free Attacks on Stochastic Bandits0
Doubly Robust Thompson Sampling with Linear Payoffs0
Optimizing Conditional Value-At-Risk of Black-Box FunctionsCode0
Adaptive Gating for Single-Photon 3D Imaging0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Hierarchical Bayesian Bandits0
The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle0
Maillard Sampling: Boltzmann Exploration Done Optimally0
Online Learning of Energy Consumption for Navigation of Electric Vehicles0
Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling0
Show:102550
← PrevPage 6 of 14Next →

No leaderboard results yet.