SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 601650 of 655 papers

TitleStatusHype
Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte CarloCode0
Thompson Sampling for Bandit Learning in Matching MarketsCode0
Differentially Private Online Bayesian Estimation With Adaptive TruncationCode0
Multi-Agent Active Search using Realistic Depth-Aware Noise ModelCode0
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple PlaysCode0
Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic maskingCode0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Improving Portfolio Optimization Results with Bandit NetworksCode0
Thompson Sampling for Robust Transfer in Multi-Task BanditsCode0
Sequential Monte Carlo BanditsCode0
Distributed Thompson sampling under constrained communicationCode0
Thompson Sampling via Local UncertaintyCode0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive AdvantagesCode0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Double Thompson Sampling for Dueling BanditsCode0
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
Randomized Exploration for Non-Stationary Stochastic Linear BanditsCode0
Neural Bandits for Data Mining: Searching for Dangerous PolypharmacyCode0
Optimizing Conditional Value-At-Risk of Black-Box FunctionsCode0
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning ApproachCode0
Asynchronous Parallel Bayesian Optimisation via Thompson SamplingCode0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Addressing Missing Data Issue for Diffusion-based RecommendationCode0
Asynchronous ε-Greedy Bayesian OptimisationCode0
Bayesian Non-stationary Linear Bandits for Large-Scale Recommender SystemsCode0
Bayesian bandits: balancing the exploration-exploitation tradeoff via double samplingCode0
Information-Directed Exploration for Deep Reinforcement LearningCode0
VITS : Variational Inference Thompson Sampling for contextual banditsCode0
Representative Action Selection for Large Action-Space Meta-BanditsCode0
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling ParadoxCode0
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized RecommendationsCode0
Thompson Sampling for Linearly Constrained BanditsCode0
Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted AveragesCode0
Tsetlin Machine for Solving Contextual Bandit ProblemsCode0
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Bandit Learning with Implicit FeedbackCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Thompson Sampling with Information Relaxation PenaltiesCode0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Odds-Ratio Thompson Sampling to Control for Time-Varying EffectCode0
Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsCode0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Trajectory-oriented optimization of stochastic epidemiological modelsCode0
On Bits and Bandits: Quantifying the Regret-Information Trade-offCode0
Learning to Play Imperfect-Information Games by Imitating an Oracle PlannerCode0
Process-constrained batch Bayesian approaches for yield optimization in multi-reactor systemsCode0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Show:102550
← PrevPage 13 of 14Next →

No leaderboard results yet.