SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 301350 of 655 papers

TitleStatusHype
Variational Bayesian Optimistic Sampling0
Differentially Private Federated Bayesian Optimization with Distributed Exploration0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes0
Show Me the Whole World: Towards Entire Item Space Exploration for Interactive Personalized RecommendationsCode0
EE-Net: Exploitation-Exploration Neural Networks in Contextual BanditsCode1
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning0
Batched Thompson Sampling0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles0
Expected Improvement-based Contextual Bandits0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
Deep Exploration for Recommendation Systems0
Vaccine allocation policy optimization and budget sharing mechanism using Thompson samplingCode0
Online Learning of Network Bottlenecks via Minimax Paths0
Machine Learning for Online Algorithm Selection under Censored FeedbackCode0
Thompson Sampling for Bandits with Clustered Arms0
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems0
Scalable regret for learning to control network-coupled subsystems with unknown dynamics0
Batched Thompson Sampling for Multi-Armed Bandits0
Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models0
Debiasing Samples from Online Learning Using Bootstrap0
Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce0
From Predictions to Decisions: The Importance of Joint Predictive Distributions0
GuideBoot: Guided Bootstrap for Deep Contextual Bandits0
No Regrets for Learning the Prior in Bandits0
Metalearning Linear Bandits by Prior Update0
Bayesian decision-making under misspecified priors with applications to meta-learning0
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow0
Random Effect Bandits0
Thompson Sampling for Unimodal Bandits0
Thompson Sampling with a Mixture Prior0
Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?0
A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms0
Parallelizing Thompson Sampling0
Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits0
Asymptotically Optimal Bandits under Weighted Information0
Diffusion Approximations for Thompson Sampling0
Thompson Sampling for Gaussian Entropic Risk Bandits0
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep NetworksCode1
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson SamplingCode1
High-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling0
When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution0
Blind Exploration and Exploitation of Stochastic Experts0
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments0
Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Online Multi-Armed Bandits with Adaptive Inference0
Show:102550
← PrevPage 7 of 14Next →

No leaderboard results yet.