SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 551575 of 655 papers

TitleStatusHype
Adaptive Sensor Placement for Continuous Spaces0
Adaptive Experimentation in the Presence of Exogenous Nonstationary Variation0
A Distributed Neural Linear Thompson Sampling Framework to Achieve URLLC in Industrial IoT0
Adjusted Expected Improvement for Cumulative Regret Minimization in Noisy Bayesian Optimization0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
A Formal Solution to the Grain of Truth Problem0
A General Theory of the Stochastic Linear Bandit and Its Applications0
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization0
Aligning AI Agents via Information-Directed Sampling0
A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data0
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces0
Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring0
Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms0
Adaptive Rate of Convergence of Thompson Sampling for Gaussian Process Optimization0
Analysis of Thompson Sampling for Graphical Bandits Without the Graphs0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of Bandits0
An Analysis of Ensemble Sampling0
An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits0
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling0
An Empirical Evaluation of Thompson Sampling0
An Extremely Data-efficient and Generative LLM-based Reinforcement Learning Agent for Recommenders0
An improved regret analysis for UCB-N and TS-N0
Show:102550
← PrevPage 23 of 27Next →

No leaderboard results yet.