SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 501525 of 655 papers

TitleStatusHype
Thompson Sampling with Virtual Helping Agents0
Time-Sensitive Bandit Learning and Satisficing Thompson Sampling0
Top Two Algorithms Revisited0
Towards Optimal Algorithms for Prediction with Expert Advice0
Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework0
Tree Ensembles for Contextual Bandits0
Truthful mechanisms for linear bandit games with private contexts0
TSEB: More Efficient Thompson Sampling for Policy Learning0
TSEC: a framework for online experimentation under experimental constraints0
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation0
Two-Stage Resource Allocation in Reconfigurable Intelligent Surface Assisted Hybrid Networks via Multi-Player Bandits0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making0
Reinforcement Learning in Credit Scoring and Underwriting0
Unimodal Thompson Sampling for Graph-Structured Arms0
Using Adaptive Experiments to Rapidly Help Students0
Variable Selection via Thompson Sampling0
Variational Bayesian Optimistic Sampling0
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings0
When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution0
When and why randomised exploration works (in linear bandits)0
When Combinatorial Thompson Sampling meets Approximation Regret0
Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation0
Zero-Inflated Bandits0
A Bandit Approach to Online Pricing for Heterogeneous Edge Resource Allocation0
Show:102550
← PrevPage 21 of 27Next →

No leaderboard results yet.