SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 501550 of 655 papers

TitleStatusHype
Thompson Sampling with Virtual Helping Agents0
Time-Sensitive Bandit Learning and Satisficing Thompson Sampling0
Top Two Algorithms Revisited0
Towards Optimal Algorithms for Prediction with Expert Advice0
Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework0
Tree Ensembles for Contextual Bandits0
Truthful mechanisms for linear bandit games with private contexts0
TSEB: More Efficient Thompson Sampling for Policy Learning0
TSEC: a framework for online experimentation under experimental constraints0
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation0
Two-Stage Resource Allocation in Reconfigurable Intelligent Surface Assisted Hybrid Networks via Multi-Player Bandits0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making0
Reinforcement Learning in Credit Scoring and Underwriting0
Unimodal Thompson Sampling for Graph-Structured Arms0
Using Adaptive Experiments to Rapidly Help Students0
Variable Selection via Thompson Sampling0
Variational Bayesian Optimistic Sampling0
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings0
When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution0
When and why randomised exploration works (in linear bandits)0
When Combinatorial Thompson Sampling meets Approximation Regret0
Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation0
Zero-Inflated Bandits0
A Bandit Approach to Online Pricing for Heterogeneous Edge Resource Allocation0
A Batched Multi-Armed Bandit Approach to News Headline Testing0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
A Bayesian Choice Model for Eliminating Feedback Loops0
Accelerating Grasp Exploration by Leveraging Learned Priors0
A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms0
A Combinatorial Semi-Bandit Approach to Charging Station Selection for Electric Vehicles0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
A Copula approach for hyperparameter transfer learning0
A Quantile-based Approach for Hyperparameter Transfer Learning0
Fast Change Identification in Multi-Play Bandits and its Applications in Wireless Networks0
Active Reinforcement Learning with Monte-Carlo Tree Search0
Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling0
AdaptEx: A Self-Service Contextual Bandit Platform0
Adaptive Combinatorial Allocation0
Adaptive Data Augmentation for Thompson Sampling0
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches0
Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits0
Adaptive Gating for Single-Photon 3D Imaging0
Adaptive Grey-Box Fuzz-Testing with Thompson Sampling0
Adaptively Learning to Select-Rank in Online Platforms0
Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce0
Adaptive Model Selection Framework: An Application to Airline Pricing0
Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D0
Show:102550
← PrevPage 11 of 14Next →

No leaderboard results yet.