SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 2130 of 655 papers

TitleStatusHype
Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret0
Bayesian learning of the optimal action-value function in a Markov decision process0
Neural Contextual Bandits Under Delayed Feedback Constraints0
Counterfactual Inference under Thompson Sampling0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Sparse Nonparametric Contextual Bandits0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
Show:102550
← PrevPage 3 of 66Next →

No leaderboard results yet.