SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 5160 of 655 papers

TitleStatusHype
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Sparse Nonparametric Contextual Bandits0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
When and why randomised exploration works (in linear bandits)0
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems0
Contextual Thompson Sampling via Generation of Missing Data0
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces0
Show:102550
← PrevPage 6 of 66Next →

No leaderboard results yet.