SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 501550 of 655 papers

TitleStatusHype
Randomised Bayesian Least-Squares Policy Iteration0
Sampling Acquisition Functions for Batch Bayesian Optimization0
On Multi-Armed Bandit Designs for Dose-Finding Clinical Trials0
Sample-Efficient Model-Free Reinforcement Learning with Off-Policy CriticsCode0
Meta Dynamic Pricing: Transfer Learning Across Experiments0
Constrained Thompson Sampling for Wireless Link Optimization0
Fully Distributed Bayesian Optimization with Stochastic Policies0
Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes0
Scalable Thompson Sampling via Optimal Transport0
Thompson Sampling with Information Relaxation PenaltiesCode0
KLUCB Approach to Copeland Bandits0
First-Order Bayesian Regret Analysis of Thompson Sampling0
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model0
Thompson Sampling for a Fatigue-aware Online Recommendation SystemCode0
Parallel Contextual Bandits in Wireless Handover Optimization0
Information-Directed Exploration for Deep Reinforcement LearningCode0
MergeDTS: A Method for Effective Large-Scale Online Ranker EvaluationCode0
Thompson Sampling for Noncompliant Bandits0
Bandit Learning with Implicit FeedbackCode0
Optimal Learning for Dynamic Coding in Deadline-Constrained Multi-Channel Networks0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Thompson Sampling for Pursuit-Evasion Problems0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting0
Combining Bayesian Optimization and Lipschitz Optimization0
Thompson Sampling Algorithms for Cascading Bandits0
Contextual Multi-Armed Bandits for Causal Marketing0
Efficient Linear Bandits through Matrix Sketching0
Incorporating Behavioral Constraints in Online AI Systems0
Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms0
Adaptive Grey-Box Fuzz-Testing with Thompson Sampling0
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
Sequential Monte Carlo BanditsCode0
Deep Contextual Multi-armed Bandits0
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits0
Optimization of a SSP's Header Bidding Strategy using Thompson Sampling0
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems0
On The Differential Privacy of Thompson Sampling With Gaussian Prior0
Randomized Value Functions via Multiplicative Normalizing FlowsCode0
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling0
An Information-Theoretic Analysis for Thompson Sampling with Many Actions0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
New Insights into Bootstrapping for Bandits0
Analysis of Thompson Sampling for Graphical Bandits Without the Graphs0
PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits0
Profitable Bandits0
Thompson Sampling for Combinatorial Semi-Bandits0
Active Reinforcement Learning with Monte-Carlo Tree Search0
Satisficing in Time-Sensitive Bandit Learning0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Show:102550
← PrevPage 11 of 14Next →

No leaderboard results yet.