SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 211220 of 655 papers

TitleStatusHype
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning0
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms0
Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
When Combinatorial Thompson Sampling meets Approximation Regret0
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits0
A Bandit Approach to Online Pricing for Heterogeneous Edge Resource Allocation0
Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration0
Leveraging Demonstrations to Improve Online Learning: Quality Matters0
Show:102550
← PrevPage 22 of 66Next →

No leaderboard results yet.