SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 211220 of 655 papers

TitleStatusHype
Bag of Policies for Distributional Deep Exploration0
Double Thompson Sampling in Finite stochastic Games0
Online Multi-Armed Bandits with Adaptive Inference0
Doubly robust Thompson sampling for linear payoffs0
Doubly Robust Thompson Sampling with Linear Payoffs0
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN0
Dual-Directed Algorithm Design for Efficient Pure Exploration0
Bandit Convex Optimization: sqrtT Regret in One Dimension0
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation0
Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce0
Show:102550
← PrevPage 22 of 66Next →

No leaderboard results yet.