SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 511520 of 655 papers

TitleStatusHype
Two-Stage Resource Allocation in Reconfigurable Intelligent Surface Assisted Hybrid Networks via Multi-Player Bandits0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making0
Reinforcement Learning in Credit Scoring and Underwriting0
Unimodal Thompson Sampling for Graph-Structured Arms0
Using Adaptive Experiments to Rapidly Help Students0
Variable Selection via Thompson Sampling0
Variational Bayesian Optimistic Sampling0
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings0
When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution0
Show:102550
← PrevPage 52 of 66Next →

No leaderboard results yet.