SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 151200 of 655 papers

TitleStatusHype
An Analysis of Ensemble Sampling0
Batch Bayesian Optimization for Replicable Experimental Design0
Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of Bandits0
Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization0
Bandits Under The Influence (Extended Version)0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility0
Bandit Models of Human Behavior: Reward Processing in Mental Disorders0
Analysis of Thompson Sampling for Graphical Bandits Without the Graphs0
Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits0
A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
Bandit Learning for Diversified Interactive Recommendation0
Adaptive Rate of Convergence of Thompson Sampling for Gaussian Process Optimization0
Bandit Convex Optimization: sqrtT Regret in One Dimension0
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control0
Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms0
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches0
BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration0
Bag of Policies for Distributional Deep Exploration0
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring0
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning0
Automatic Ensemble Learning for Online Influence Maximization0
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces0
Adaptive Data Augmentation for Thompson Sampling0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data0
A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning0
Diffusion Approximations for Thompson Sampling0
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems0
Aligning AI Agents via Information-Directed Sampling0
Differentially Private Federated Bayesian Optimization with Distributed Exploration0
Delay-Adaptive Learning in Generalized Linear Contextual Bandits0
Deep Hierarchy in Bandits0
Deep Contextual Multi-armed Bandits0
Asynchronous Multi Agent Active Search0
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization0
Adaptive Combinatorial Allocation0
A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits0
A Batched Multi-Armed Bandit Approach to News Headline Testing0
Deep Active Ensemble Sampling For Image Classification0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Deciding What to Learn: A Rate-Distortion Approach0
Deep Exploration for Recommendation Systems0
Decentralized Multi-Agent Active Search and Tracking when Targets Outnumber Agents0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Debiasing Samples from Online Learning Using Bootstrap0
Asymptotic Convergence of Thompson Sampling0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.