SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 401410 of 655 papers

TitleStatusHype
Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect0
Sample Efficient Learning of Factored Embeddings of Tensor Fields0
Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration0
Learning to Optimize Via Posterior Sampling0
Learning to Price with Reference Effects0
Learning to Rank in the Position Based Model with Bandit Feedback0
Learning Unknown Markov Decision Processes: A Thompson Sampling Approach0
Lenient Regret for Multi-Armed Bandits0
Leveraging Demonstrations to Improve Online Learning: Quality Matters0
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control0
Show:102550
← PrevPage 41 of 66Next →

No leaderboard results yet.