SOTAVerified|Agents Browse Leaderboard About

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 211–220 of 655 papers

Title	Date	Tasks	Status
Bag of Policies for Distributional Deep Exploration	Aug 3, 2023	Atari GamesEfficient Exploration	—Unverified
Double Thompson Sampling in Finite stochastic Games	Feb 21, 2022	Thompson Sampling	—Unverified
Online Multi-Armed Bandits with Adaptive Inference	Feb 25, 2021	Causal InferenceDecision Making	—Unverified
Doubly robust Thompson sampling for linear payoffs	Feb 1, 2021	Thompson Sampling	—Unverified
Doubly Robust Thompson Sampling with Linear Payoffs	Dec 1, 2021	Thompson Sampling	—Unverified
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN	Jul 16, 2024	Decision MakingDeep Reinforcement Learning	—Unverified
Dual-Directed Algorithm Design for Efficient Pure Exploration	Oct 30, 2023	Thompson Sampling	—Unverified
Bandit Convex Optimization: sqrtT Regret in One Dimension	Feb 23, 2015	Thompson Sampling	—Unverified
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation	Aug 25, 2022	Collaborative FilteringRecommendation Systems	—Unverified
Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce	Jul 30, 2021	Thompson Sampling	—Unverified

Show:10 25 50

← PrevPage 22 of 66Next →

No leaderboard results yet.