SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 3140 of 655 papers

TitleStatusHype
Scalable Exploration via Ensemble++Code0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Causal Bandits for Linear Structural Equation ModelsCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Anytime Multi-Agent Path Finding with an Adaptive Delay-Based HeuristicCode0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Double Thompson Sampling for Dueling BanditsCode0
Show:102550
← PrevPage 4 of 66Next →

No leaderboard results yet.