SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 211220 of 655 papers

TitleStatusHype
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
Double Thompson Sampling in Finite stochastic Games0
Online Multi-Armed Bandits with Adaptive Inference0
Doubly robust Thompson sampling for linear payoffs0
Doubly Robust Thompson Sampling with Linear Payoffs0
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN0
Dual-Directed Algorithm Design for Efficient Pure Exploration0
Counterfactual Data-Fusion for Online Reinforcement Learners0
Dynamic collaborative filtering Thompson Sampling for cross-domain advertisements recommendation0
Asymptotically Optimal Bandits under Weighted Information0
Show:102550
← PrevPage 22 of 66Next →

No leaderboard results yet.