SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 421430 of 655 papers

TitleStatusHype
Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models0
Meta Dynamic Pricing: Transfer Learning Across Experiments0
Meta Learning in Bandits within Shared Affine Subspaces0
Metalearning Linear Bandits by Prior Update0
Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks0
Meta-Reinforcement Learning With Informed Policy Regularization0
Meta-Thompson Sampling0
Minimal Exploration in Structured Stochastic Bandits0
TS-RSR: A provably efficient approach for batch Bayesian Optimization0
Mixed-Variable Bayesian Optimization0
Show:102550
← PrevPage 43 of 66Next →

No leaderboard results yet.