SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 601625 of 655 papers

TitleStatusHype
Accelerating Approximate Thompson Sampling with Underdamped Langevin Monte CarloCode0
Thompson Sampling for Bandit Learning in Matching MarketsCode0
Differentially Private Online Bayesian Estimation With Adaptive TruncationCode0
Multi-Agent Active Search using Realistic Depth-Aware Noise ModelCode0
Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple PlaysCode0
Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic maskingCode0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Improving Portfolio Optimization Results with Bandit NetworksCode0
Thompson Sampling for Robust Transfer in Multi-Task BanditsCode0
Sequential Monte Carlo BanditsCode0
Distributed Thompson sampling under constrained communicationCode0
Thompson Sampling via Local UncertaintyCode0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive AdvantagesCode0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Double Thompson Sampling for Dueling BanditsCode0
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
Randomized Exploration for Non-Stationary Stochastic Linear BanditsCode0
Neural Bandits for Data Mining: Searching for Dangerous PolypharmacyCode0
Optimizing Conditional Value-At-Risk of Black-Box FunctionsCode0
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning ApproachCode0
Asynchronous Parallel Bayesian Optimisation via Thompson SamplingCode0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Addressing Missing Data Issue for Diffusion-based RecommendationCode0
Asynchronous ε-Greedy Bayesian OptimisationCode0
Show:102550
← PrevPage 25 of 27Next →

No leaderboard results yet.