SOTAVerified|Agents Browse Leaderboard About Blog

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 655 papers

Title	Date	Tasks	Status
Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret	May 5, 2025	Thompson Sampling	—Unverified
Bayesian learning of the optimal action-value function in a Markov decision process	May 3, 2025	Decision MakingSequential Decision Making	—Unverified
Neural Contextual Bandits Under Delayed Feedback Constraints	Apr 16, 2025	Multi-Armed BanditsRecommendation Systems	—Unverified
Counterfactual Inference under Thompson Sampling	Apr 3, 2025	Causal Inferencecounterfactual	—Unverified
Dynamic Assortment Selection and Pricing with Censored Preference Feedback	Apr 3, 2025	Thompson Sampling	CodeCode Available
Sparse Nonparametric Contextual Bandits	Mar 20, 2025	Multi-Armed BanditsThompson Sampling	—Unverified
Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers	Mar 3, 2025	Prompt EngineeringThompson Sampling	CodeCode Available
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling	Feb 20, 2025	Multi-Armed BanditsThompson Sampling	—Unverified
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces	Feb 20, 2025	Bayesian OptimizationThompson Sampling	—Unverified
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs	Feb 16, 2025	GSM8KThompson Sampling	—Unverified

Show:10 25 50

← PrevPage 3 of 66Next →

No leaderboard results yet.