SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 131140 of 655 papers

TitleStatusHype
Blind Exploration and Exploitation of Stochastic Experts0
A Nonparametric Contextual Bandit with Arm-level Eligibility Control for Customer Service Routing0
An Online Learning Framework for Energy-Efficient Navigation of Electric Vehicles0
Adaptive Model Selection Framework: An Application to Airline Pricing0
Belief Flows of Robust Online Learning0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Best Arm Identification in Batched Multi-armed Bandit Problems0
Bayesian Quantile and Expectile Optimisation0
Show:102550
← PrevPage 14 of 66Next →

No leaderboard results yet.