SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 51100 of 655 papers

TitleStatusHype
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces0
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring0
Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms0
Adaptive Rate of Convergence of Thompson Sampling for Gaussian Process Optimization0
Analysis of Thompson Sampling for Graphical Bandits Without the Graphs0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of Bandits0
An Analysis of Ensemble Sampling0
An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits0
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling0
A Formal Solution to the Grain of Truth Problem0
An Empirical Evaluation of Thompson Sampling0
AdaptEx: A Self-Service Contextual Bandit Platform0
BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
Adjusted Expected Improvement for Cumulative Regret Minimization in Noisy Bayesian Optimization0
Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling0
A Distributed Neural Linear Thompson Sampling Framework to Achieve URLLC in Industrial IoT0
Active Reinforcement Learning with Monte-Carlo Tree Search0
A Bandit Approach to Online Pricing for Heterogeneous Edge Resource Allocation0
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning0
Bandit Change-Point Detection for Real-Time Monitoring High-Dimensional Data Under Sampling Control0
Adaptive Experimentation in the Presence of Exogenous Nonstationary Variation0
Approximate Thompson Sampling for Learning Linear Quadratic Regulators with O(T) Regret0
Approximate information for efficient exploration-exploitation strategies0
Fast Change Identification in Multi-Play Bandits and its Applications in Wireless Networks0
A Bayesian Choice Model for Eliminating Feedback Loops0
A Practical Method for Solving Contextual Bandit Problems Using Decision Trees0
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning0
Efficiently Tackling Million-Dimensional Multiobjective Problems: A Direction Sampling and Fine-Tuning Approach0
A Reinforcement Learning based Reset Policy for CDCL SAT Solvers0
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems0
A Reliability-aware Multi-armed Bandit Approach to Learn and Select Users in Demand Response0
A resource-constrained stochastic scheduling algorithm for homeless street outreach and gleaning edible food0
A sequential Monte Carlo approach to Thompson sampling for Bayesian optimization0
A Simple and Optimal Policy Design with Safety against Heavy-Tailed Risk for Stochastic Bandits0
A study of Thompson Sampling with Parameter h0
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits0
Asymptotically Optimal Bandits under Weighted Information0
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
Asymptotic Convergence of Thompson Sampling0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
Asynchronous Multi Agent Active Search0
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization0
An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization0
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems0
Adaptive Sensor Placement for Continuous Spaces0
Show:102550
← PrevPage 2 of 14Next →

No leaderboard results yet.