SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 51100 of 655 papers

TitleStatusHype
Odds-Ratio Thompson Sampling to Control for Time-Varying EffectCode0
Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsCode0
Multi-Agent Active Search using Realistic Depth-Aware Noise ModelCode0
Online Learning of Decision Trees with Thompson SamplingCode0
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple PlaysCode0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning ApproachCode0
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse HypergraphsCode0
Information-Directed Exploration for Deep Reinforcement LearningCode0
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Randomized Value Functions via Multiplicative Normalizing FlowsCode0
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic maskingCode0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop PlanningCode0
Double Thompson Sampling for Dueling BanditsCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Differentially Private Online Bayesian Estimation With Adaptive TruncationCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Distributed Thompson sampling under constrained communicationCode0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Addressing Missing Data Issue for Diffusion-based RecommendationCode0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte CarloCode0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
Causal Bandits for Linear Structural Equation ModelsCode0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
Mixed-Effect Thompson SamplingCode0
Improving Portfolio Optimization Results with Bandit NetworksCode0
Bayesian Non-stationary Linear Bandits for Large-Scale Recommender SystemsCode0
Bayesian bandits: balancing the exploration-exploitation tradeoff via double samplingCode0
Learning to Play Imperfect-Information Games by Imitating an Oracle PlannerCode0
Machine Learning for Online Algorithm Selection under Censored FeedbackCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
MergeDTS: A Method for Effective Large-Scale Online Ranker EvaluationCode0
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Asynchronous ε-Greedy Bayesian OptimisationCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Asynchronous Parallel Bayesian Optimisation via Thompson SamplingCode0
Atlas: Automate Online Service Configuration in Network SlicingCode0
Bandit Learning with Implicit FeedbackCode0
Adaptive Interventions with User-Defined Goals for Health Behavior ChangeCode0
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
Show:102550
← PrevPage 2 of 14Next →

No leaderboard results yet.