SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 176200 of 655 papers

TitleStatusHype
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems0
Aligning AI Agents via Information-Directed Sampling0
Differentially Private Federated Bayesian Optimization with Distributed Exploration0
Delay-Adaptive Learning in Generalized Linear Contextual Bandits0
Deep Hierarchy in Bandits0
Deep Contextual Multi-armed Bandits0
Asynchronous Multi Agent Active Search0
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization0
Adaptive Combinatorial Allocation0
A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits0
A Batched Multi-Armed Bandit Approach to News Headline Testing0
Deep Active Ensemble Sampling For Image Classification0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Deciding What to Learn: A Rate-Distortion Approach0
Decentralized Multi-Agent Active Search and Tracking when Targets Outnumber Agents0
Asymptotic Performance of Thompson Sampling in the Batched Multi-Armed Bandits0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Debiasing Samples from Online Learning Using Bootstrap0
Deep Exploration for Recommendation Systems0
Asymptotic Convergence of Thompson Sampling0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Cover Tree Bayesian Reinforcement Learning0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms0
Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits0
Show:102550
← PrevPage 8 of 27Next →

No leaderboard results yet.