SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 7180 of 655 papers

TitleStatusHype
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
Optimization-Driven Adaptive Experimentation0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Anytime Multi-Agent Path Finding with an Adaptive Delay-Based HeuristicCode0
Process-constrained batch Bayesian approaches for yield optimization in multi-reactor systemsCode0
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback0
Thompson Sampling Itself is Differentially Private0
Scalable Exploration via Ensemble++Code0
DRL-based Joint Resource Scheduling of eMBB and URLLC in O-RAN0
Joint User Association and Pairing in Multi-UAV-Assisted NOMA Networks: A Decaying-Epsilon Thompson Sampling Framework0
Show:102550
← PrevPage 8 of 66Next →

No leaderboard results yet.