SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 611620 of 655 papers

TitleStatusHype
Distributed Thompson sampling under constrained communicationCode0
Thompson Sampling via Local UncertaintyCode0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive AdvantagesCode0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Double Thompson Sampling for Dueling BanditsCode0
Incentivizing Exploration In Reinforcement Learning With Deep Predictive ModelsCode0
Randomized Exploration for Non-Stationary Stochastic Linear BanditsCode0
Neural Bandits for Data Mining: Searching for Dangerous PolypharmacyCode0
Optimizing Conditional Value-At-Risk of Black-Box FunctionsCode0
Show:102550
← PrevPage 62 of 66Next →

No leaderboard results yet.