SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 301350 of 655 papers

TitleStatusHype
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis0
Module-wise Adaptive Distillation for Multimodality Foundation Models0
Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning0
Monte-Carlo tree search with uncertainty propagation via optimal transport0
MOTS: Minimax Optimal Thompson Sampling0
Multi-Agent Active Search using Detection and Location Uncertainty0
Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?0
Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes0
Multi-armed Bandits with Cost Subsidy0
Multi-dueling Bandits with Dependent Arms0
Multi-Task Combinatorial Bandits for Budget Allocation0
Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses0
Neural Contextual Bandits Under Delayed Feedback Constraints0
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback0
Neural Model-based Optimization with Right-Censored Observations0
New Insights into Bootstrapping for Bandits0
No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling0
Nonparametric General Reinforcement Learning0
Non-Stationary Bandit Learning via Predictive Sampling0
Non-Stationary Dynamic Pricing Via Actor-Critic Information-Directed Pricing0
Non-Stationary Latent Bandits0
No Regrets for Learning the Prior in Bandits0
Observation-Free Attacks on Stochastic Bandits0
On Adaptive Estimation for Dynamic Bernoulli Bandits0
On Batch Bayesian Optimization0
On Dynamic Pricing with Covariates0
On Efficiency in Hierarchical Reinforcement Learning0
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise0
On Kernelized Multi-Armed Bandits with Constraints0
On learning Whittle index policy for restless bandits with scalable regret0
Online Algorithms For Parameter Mean And Variance Estimation In Dynamic Regression Models0
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits0
Online Causal Inference for Advertising in Real-Time Bidding Auctions0
Online Learning and Distributed Control for Residential Demand Response0
Online Learning-based Waveform Selection for Improved Vehicle Recognition in Automotive Radar0
Online Learning of Energy Consumption for Navigation of Electric Vehicles0
Online Learning of Network Bottlenecks via Minimax Paths0
Online Residential Demand Response via Contextual Multi-Armed Bandits0
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
On Multi-Armed Bandit Designs for Dose-Finding Clinical Trials0
On Online Learning in Kernelized Markov Decision Processes0
On The Differential Privacy of Thompson Sampling With Gaussian Prior0
On the Importance of Uncertainty in Decision-Making with Large Language Models0
On the Performance of Thompson Sampling on Logistic Bandits0
On the Prior Sensitivity of Thompson Sampling0
On Thompson Sampling for Smoother-than-Lipschitz Bandits0
On Thompson Sampling with Langevin Algorithms0
On Frequentist Regret of Linear Thompson Sampling0
Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment0
Optimal Exploration is no harder than Thompson Sampling0
Show:102550
← PrevPage 7 of 14Next →

No leaderboard results yet.