SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 451500 of 655 papers

TitleStatusHype
Thompson Sampling Achieves O(T) Regret in Linear Quadratic Control0
Thompson Sampling with Approximate Inference0
Thompson Sampling and Approximate Inference0
Analysis of Thompson Sampling for Controlling Unknown Linear Diffusion Processes0
Thompson Sampling for 1-Dimensional Exponential Family Bandits0
Thompson Sampling for Adversarial Bit Prediction0
Thompson Sampling for Bandits with Clustered Arms0
Thompson Sampling for Budgeted Multi-armed Bandits0
Thompson Sampling Algorithms for Cascading Bandits0
Thompson Sampling for Combinatorial Network Optimization in Unknown Environments0
Thompson Sampling for (Combinatorial) Pure Exploration0
Thompson Sampling for Combinatorial Semi-Bandits0
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints0
Thompson Sampling for Complex Bandit Problems0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling for Dynamic Pricing0
Thompson Sampling for Gaussian Entropic Risk Bandits0
Thompson sampling for improved exploration in GFlowNets0
Thompson Sampling for Infinite-Horizon Discounted Decision Processes0
Thompson Sampling for Learning Parameterized Markov Decision Processes0
Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors0
Thompson Sampling for Linear-Quadratic Control Problems0
Thompson sampling for linear quadratic mean-field teams0
Thompson Sampling for Noncompliant Bandits0
Thompson Sampling for Online Learning with Linear Experts0
Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions0
Thompson Sampling for Pursuit-Evasion Problems0
Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit0
Thompson Sampling For Stochastic Bandits with Graph Feedback0
Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis0
Thompson Sampling for the MNL-Bandit0
Thompson Sampling for Unimodal Bandits0
Thompson Sampling for Unsupervised Sequential Selection0
Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study0
Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems0
Thompson Sampling in Dynamic Systems for Contextual Bandit Problems0
Thompson Sampling in Non-Episodic Restless Bandits0
Thompson Sampling in Online RLHF with General Function Approximation0
Thompson Sampling in Partially Observable Contextual Bandits0
Thompson Sampling is Asymptotically Optimal in General Environments0
Thompson Sampling Itself is Differentially Private0
Thompson Sampling-like Algorithms for Stochastic Rising Bandits0
Thompson Sampling on Asymmetric α-Stable Bandits0
Thompson Sampling on Symmetric α-Stable Bandits0
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards0
Thompson Sampling under Bernoulli Rewards with Local Differential Privacy0
Thompson Sampling with a Mixture Prior0
Thompson Sampling with Diffusion Generative Prior0
Thompson sampling with the online bootstrap0
Thompson Sampling with Unrestricted Delays0
Show:102550
← PrevPage 10 of 14Next →

No leaderboard results yet.