SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 5175 of 655 papers

TitleStatusHype
Double Thompson Sampling for Dueling BanditsCode0
Improving Portfolio Optimization Results with Bandit NetworksCode0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson SamplingCode0
Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte CarloCode0
MergeDTS: A Method for Effective Large-Scale Online Ranker EvaluationCode0
More Efficient Randomized Exploration for Reinforcement Learning via Approximate SamplingCode0
Mostly Exploration-Free Algorithms for Contextual BanditsCode0
Addressing Missing Data Issue for Diffusion-based RecommendationCode0
Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic maskingCode0
Distributed Thompson sampling under constrained communicationCode0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop PlanningCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Bayesian Algorithms for Decentralized Stochastic BanditsCode0
Causal Bandits for Linear Structural Equation ModelsCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Bayesian bandits: balancing the exploration-exploitation tradeoff via double samplingCode0
Show:102550
← PrevPage 3 of 27Next →

No leaderboard results yet.