SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 7180 of 655 papers

TitleStatusHype
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Double Thompson Sampling for Dueling BanditsCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop PlanningCode0
Bayesian bandits: balancing the exploration-exploitation tradeoff via double samplingCode0
Bayesian Algorithms for Decentralized Stochastic BanditsCode0
Differentially Private Online Bayesian Estimation With Adaptive TruncationCode0
Bayesian Non-stationary Linear Bandits for Large-Scale Recommender SystemsCode0
Bandit Learning with Implicit FeedbackCode0
Show:102550
← PrevPage 8 of 66Next →

No leaderboard results yet.