SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 461470 of 655 papers

TitleStatusHype
Thompson Sampling for (Combinatorial) Pure Exploration0
Thompson Sampling for Combinatorial Semi-Bandits0
Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints0
Thompson Sampling for Complex Bandit Problems0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling for Dynamic Pricing0
Thompson Sampling for Gaussian Entropic Risk Bandits0
Thompson sampling for improved exploration in GFlowNets0
Thompson Sampling for Infinite-Horizon Discounted Decision Processes0
Thompson Sampling for Learning Parameterized Markov Decision Processes0
Show:102550
← PrevPage 47 of 66Next →

No leaderboard results yet.