SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 551575 of 655 papers

TitleStatusHype
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Causal Bandits for Linear Structural Equation ModelsCode0
Thompson Sampling: An Asymptotically Optimal Finite Time AnalysisCode0
Scalable Exploration via Ensemble++Code0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy CriticsCode0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Machine Learning for Online Algorithm Selection under Censored FeedbackCode0
Stacked Thompson BanditsCode0
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Online Learning of Decision Trees with Thompson SamplingCode0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Vaccine allocation policy optimization and budget sharing mechanism using Thompson samplingCode0
Bayesian Algorithms for Decentralized Stochastic BanditsCode0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop PlanningCode0
State-Aware Variational Thompson Sampling for Deep Q-NetworksCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse HypergraphsCode0
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson SamplingCode0
Adaptive Interventions with User-Defined Goals for Health Behavior ChangeCode0
A Unifying Theory of Thompson Sampling for Continuous Risk-Averse BanditsCode0
MergeDTS: A Method for Effective Large-Scale Online Ranker EvaluationCode0
Show:102550
← PrevPage 23 of 27Next →

No leaderboard results yet.