SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 401450 of 655 papers

TitleStatusHype
Position-Based Multiple-Play Bandits with Thompson Sampling0
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control0
Partially Observable Online Change Detection via Smooth-Sparse Decomposition0
Bandits Under The Influence (Extended Version)0
Causal Bandits without prior knowledge using separating sets0
Thompson Sampling for Unsupervised Sequential Selection0
A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits0
Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling0
Contextual Bandits for Advertising Budget Allocation0
Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses0
Reinforcement Learning with Trajectory Feedback0
Lenient Regret for Multi-Armed Bandits0
IntelligentPooling: Practical Thompson Sampling for mHealth0
Greedy Bandits with Sampled Context0
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems0
Variable Selection via Thompson Sampling0
Policy Gradient Optimization of Thompson Sampling Policies0
Asynchronous Multi Agent Active Search0
Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect0
Constrained Thompson Sampling for Real-Time Electricity Pricing with Grid Reliability Constraints0
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring0
Latent Bandits Revisited0
Hypermodels for Exploration0
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation0
On Frequentist Regret of Linear Thompson Sampling0
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits0
Scalable Thompson Sampling using Sparse Gaussian Process Models0
Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization0
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling0
Concurrent Decentralized Channel Allocation and Access Point Selection using Multi-Armed Bandits in multi BSS WLANs0
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints0
Learning to Rank in the Position Based Model with Bandit Feedback0
Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization0
Adaptive Operator Selection Based on Dynamic Thompson Sampling for MOEA/D0
Thompson Sampling for Linearly Constrained BanditsCode0
Optimal No-regret Learning in Repeated First-price Auctions0
A Reliability-aware Multi-armed Bandit Approach to Learn and Select Users in Demand Response0
Delay-Adaptive Learning in Generalized Linear Contextual Bandits0
Online Residential Demand Response via Contextual Multi-Armed Bandits0
Odds-Ratio Thompson Sampling to Control for Time-Varying EffectCode0
An Online Learning Framework for Energy-Efficient Navigation of Electric Vehicles0
MOTS: Minimax Optimal Thompson Sampling0
Efficient exploration of zero-sum stochastic games0
On Thompson Sampling with Langevin Algorithms0
Residual Bootstrap Exploration for Bandit Algorithms0
A General Theory of the Stochastic Linear Bandit and Its Applications0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Thompson Sampling Algorithms for Mean-Variance BanditsCode0
Bayesian Quantile and Expectile Optimisation0
On Thompson Sampling for Smoother-than-Lipschitz Bandits0
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.