SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 551560 of 655 papers

TitleStatusHype
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Thompson Sampling for Dynamic Pricing0
Information Directed Sampling and Bandits with Heteroscedastic Noise0
Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling0
On Adaptive Estimation for Dynamic Bernoulli Bandits0
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds0
Efficient exploration with Double Uncertain Value Networks0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Show:102550
← PrevPage 56 of 66Next →

No leaderboard results yet.