SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 326350 of 655 papers

TitleStatusHype
Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling0
Efficient Thompson Sampling for Online Matrix-Factorization Recommendation0
Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates0
Eluder Dimension and the Sample Complexity of Optimistic Exploration0
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment0
Ensemble Sampling0
Epinet for Content Cold Start0
Epsilon-Greedy Thompson Sampling to Bayesian Optimization0
Estimating prediction error for complex samples0
Estimating Quality in Multi-Objective Bandits Optimization0
Etat de l'art sur l'application des bandits multi-bras0
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
From Predictions to Decisions: The Importance of Joint Predictive Distributions0
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems0
Expected Improvement-based Contextual Bandits0
Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization0
A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting0
Exploration for Multi-task Reinforcement Learning with Deep Generative Models0
Exploration via linearly perturbed loss minimisation0
Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network0
Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization0
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning0
Feel-Good Thompson Sampling for Contextual Dueling Bandits0
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits0
First-Order Bayesian Regret Analysis of Thompson Sampling0
Show:102550
← PrevPage 14 of 27Next →

No leaderboard results yet.