SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 261270 of 655 papers

TitleStatusHype
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model0
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
Contextual Bandits with Non-Stationary Correlated Rewards for User Association in MmWave Vehicular Networks0
Contextual Bandits for Advertising Budget Allocation0
A resource-constrained stochastic scheduling algorithm for homeless street outreach and gleaning edible food0
Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling0
Context Attribution with Multi-Armed Bandit Optimization0
A Reliability-aware Multi-armed Bandit Approach to Learn and Select Users in Demand Response0
Show:102550
← PrevPage 27 of 66Next →

No leaderboard results yet.