SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 576600 of 655 papers

TitleStatusHype
Queueing Matching Bandits with Preference FeedbackCode0
Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian ProcessesCode0
On Provably Robust Meta-Bayesian OptimizationCode0
Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood StructuresCode0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Atlas: Automate Online Service Configuration in Network SlicingCode0
Scalable Optimization for Wind Farm Control using Coordination GraphsCode0
Variational inference for the multi-armed contextual banditCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Mixed-Effect Thompson SamplingCode0
On the Suboptimality of Thompson Sampling in High DimensionsCode0
Randomized Value Functions via Multiplicative Normalizing FlowsCode0
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Ranking In Generalized Linear BanditsCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Thompson Sampling for High-Dimensional Sparse Linear Contextual BanditsCode0
Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental HealthCode0
Sub-sampling for Efficient Non-Parametric Bandit ExplorationCode0
Information-Directed Selection for Top-Two AlgorithmsCode0
Thompson Sampling for a Fatigue-aware Online Recommendation SystemCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Regret Bounds for Thompson Sampling in Episodic Restless Bandit ProblemsCode0
More Efficient Randomized Exploration for Reinforcement Learning via Approximate SamplingCode0
Mostly Exploration-Free Algorithms for Contextual BanditsCode0
Show:102550
← PrevPage 24 of 27Next →

No leaderboard results yet.