SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 501550 of 655 papers

TitleStatusHype
Policy Gradient Optimization of Thompson Sampling Policies0
Position-Based Multiple-Play Bandits with Thompson Sampling0
Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds0
Posterior sampling for reinforcement learning: worst-case regret bounds0
Posterior Sampling via Autoregressive Generation0
Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection0
Preferential Multi-Objective Bayesian Optimization0
Prior-free and prior-dependent regret bounds for Thompson Sampling0
Probabilistic Inference in Reinforcement Learning Done Right0
Profitable Bandits0
QoS-Aware Multi-Armed Bandits0
Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors0
Random Effect Bandits0
Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization0
Randomised Bayesian Least-Squares Policy Iteration0
Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning0
Regenerative Particle Thompson Sampling0
Regret Analysis of Bandit Problems with Causal Background Knowledge0
Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits0
Regret Bounds for Information-Directed Reinforcement Learning0
Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles0
Reinforcement Learning for Efficient and Tuning-Free Link Adaptation0
Reinforcement learning techniques for Outer Loop Link Adaptation in 4G/5G systems0
Reinforcement Learning with Subspaces using Free Energy Paradigm0
Reinforcement Learning with Trajectory Feedback0
Remote Contextual Bandits0
Residual Bootstrap Exploration for Bandit Algorithms0
Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning0
Reward Biased Maximum Likelihood Estimation for Reinforcement Learning0
Risk and optimal policies in bandit experiments0
Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs0
Risk-Constrained Thompson Sampling for CVaR Bandits0
Robust Dynamic Assortment Optimization in the Presence of Outlier Customers0
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments0
Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks0
Safe Linear Leveling Bandits0
Safe Linear Thompson Sampling with Side Information0
Sample-based Dynamic Hierarchical Transformer with Layer and Head Flexibility via Contextual Bandit0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Sampling Acquisition Functions for Batch Bayesian Optimization0
Satisficing in Time-Sensitive Bandit Learning0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Scalable Generalized Linear Bandits: Online Computation and Hashing0
Scalable Neural Contextual Bandit for Recommender Systems0
Scalable regret for learning to control network-coupled subsystems with unknown dynamics0
Scalable Thompson Sampling using Sparse Gaussian Process Models0
Scalable Thompson Sampling via Optimal Transport0
Scaling Multi-Armed Bandit Algorithms0
Screening for an Infectious Disease as a Problem in Stochastic Control0
Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization0
Show:102550
← PrevPage 11 of 14Next →

No leaderboard results yet.