SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 551600 of 655 papers

TitleStatusHype
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Causal Bandits for Linear Structural Equation ModelsCode0
Thompson Sampling: An Asymptotically Optimal Finite Time AnalysisCode0
Scalable Exploration via Ensemble++Code0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy CriticsCode0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Machine Learning for Online Algorithm Selection under Censored FeedbackCode0
Stacked Thompson BanditsCode0
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Online Learning of Decision Trees with Thompson SamplingCode0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Vaccine allocation policy optimization and budget sharing mechanism using Thompson samplingCode0
Bayesian Algorithms for Decentralized Stochastic BanditsCode0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop PlanningCode0
State-Aware Variational Thompson Sampling for Deep Q-NetworksCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse HypergraphsCode0
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson SamplingCode0
Adaptive Interventions with User-Defined Goals for Health Behavior ChangeCode0
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
MergeDTS: A Method for Effective Large-Scale Online Ranker EvaluationCode0
Queueing Matching Bandits with Preference FeedbackCode0
Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian ProcessesCode0
On Provably Robust Meta-Bayesian OptimizationCode0
Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood StructuresCode0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Atlas: Automate Online Service Configuration in Network SlicingCode0
Scalable Optimization for Wind Farm Control using Coordination GraphsCode0
Variational inference for the multi-armed contextual banditCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Mixed-Effect Thompson SamplingCode0
On the Suboptimality of Thompson Sampling in High DimensionsCode0
Randomized Value Functions via Multiplicative Normalizing FlowsCode0
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Ranking In Generalized Linear BanditsCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Thompson Sampling for High-Dimensional Sparse Linear Contextual BanditsCode0
Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental HealthCode0
Sub-sampling for Efficient Non-Parametric Bandit ExplorationCode0
Information-Directed Selection for Top-Two AlgorithmsCode0
Thompson Sampling for a Fatigue-aware Online Recommendation SystemCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Regret Bounds for Thompson Sampling in Episodic Restless Bandit ProblemsCode0
More Efficient Randomized Exploration for Reinforcement Learning via Approximate SamplingCode0
Mostly Exploration-Free Algorithms for Contextual BanditsCode0
Show:102550
← PrevPage 12 of 14Next →

No leaderboard results yet.