SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 2650 of 655 papers

TitleStatusHype
Neural Thompson SamplingCode1
Batched Bayesian optimization by maximizing the probability of including the optimumCode1
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
Scalable Exploration via Ensemble++Code0
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Double Thompson Sampling for Dueling BanditsCode0
Distributed Thompson sampling under constrained communicationCode0
AIXIjs: A Software Demo for General Reinforcement LearningCode0
Differentially Private Online Bayesian Estimation With Adaptive TruncationCode0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Causal Bandits for Linear Structural Equation ModelsCode0
Process-constrained batch Bayesian approaches for yield optimization in multi-reactor systemsCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Anytime Multi-Agent Path Finding with an Adaptive Delay-Based HeuristicCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
Show:102550
← PrevPage 2 of 27Next →

No leaderboard results yet.