SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 7180 of 655 papers

TitleStatusHype
A resource-constrained stochastic scheduling algorithm for homeless street outreach and gleaning edible food0
Adaptive Experimentation in the Presence of Exogenous Nonstationary Variation0
Approximate Thompson Sampling for Learning Linear Quadratic Regulators with O(T) Regret0
Approximate information for efficient exploration-exploitation strategies0
Fast Change Identification in Multi-Play Bandits and its Applications in Wireless Networks0
A Bayesian Choice Model for Eliminating Feedback Loops0
Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification0
A Practical Method for Solving Contextual Bandit Problems Using Decision Trees0
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning0
An Unbiased Data Collection and Content Exploitation/Exploration Strategy for Personalization0
Show:102550
← PrevPage 8 of 66Next →

No leaderboard results yet.