SOTAVerified|Agents Browse Leaderboard About

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 441–450 of 655 papers

Title	Date	Tasks	Status
Multi-armed Bandits with Cost Subsidy	Nov 3, 2020	Multi-Armed BanditsThompson Sampling	—Unverified
Multi-dueling Bandits with Dependent Arms	Apr 29, 2017	Thompson Sampling	—Unverified
Multi-Task Combinatorial Bandits for Budget Allocation	Aug 31, 2024	Gaussian ProcessesMarketing	—Unverified
Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses	Aug 21, 2020	Adversarial AttackThompson Sampling	—Unverified
Neural Contextual Bandits Under Delayed Feedback Constraints	Apr 16, 2025	Multi-Armed BanditsRecommendation Systems	—Unverified
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback	Jul 24, 2024	Thompson Sampling	—Unverified
Neural Model-based Optimization with Right-Censored Observations	Sep 29, 2020	modelregression	—Unverified
New Insights into Bootstrapping for Bandits	May 24, 2018	Thompson Sampling	—Unverified
No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling	May 23, 2024	Thompson Sampling	—Unverified
Nonparametric General Reinforcement Learning	Nov 28, 2016	General Reinforcement Learningreinforcement-learning	—Unverified

Show:10 25 50

← PrevPage 45 of 66Next →

No leaderboard results yet.