SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 176200 of 655 papers

TitleStatusHype
Convolutional Monte Carlo Rollouts in Go0
Cost Aware Asynchronous Multi-Agent Active Search0
Cost-efficient Knowledge-based Question Answering with Large Language Models0
Asymptotically Optimal Bandits under Weighted Information0
Counterfactual Data-Fusion for Online Reinforcement Learners0
Counterfactual Inference under Thompson Sampling0
Towards Efficient and Optimal Covariance-Adaptive Algorithms for Combinatorial Semi-Bandits0
Cover Tree Bayesian Reinforcement Learning0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Asymptotic Convergence of Thompson Sampling0
Debiasing Samples from Online Learning Using Bootstrap0
Decentralized Multi-Agent Active Search and Tracking when Targets Outnumber Agents0
Deciding What to Learn: A Rate-Distortion Approach0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Deep Active Ensemble Sampling For Image Classification0
Bayesian Quantile and Expectile Optimisation0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
Deep Contextual Multi-armed Bandits0
Deep Exploration for Recommendation Systems0
Deep Hierarchy in Bandits0
Delay-Adaptive Learning in Generalized Linear Contextual Bandits0
Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce0
Differentially Private Federated Bayesian Optimization with Distributed Exploration0
Diffusion Approximations for Thompson Sampling0
A Copula approach for hyperparameter transfer learning0
Show:102550
← PrevPage 8 of 27Next →

No leaderboard results yet.