SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 171180 of 655 papers

TitleStatusHype
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model0
Contextual Multi-Armed Bandits for Causal Marketing0
Contextual Thompson Sampling via Generation of Missing Data0
Convergence Rates of Posterior Distributions in Markov Decision Process0
Convolutional Monte Carlo Rollouts in Go0
Cost Aware Asynchronous Multi-Agent Active Search0
Cost-efficient Knowledge-based Question Answering with Large Language Models0
Asymptotically Optimal Bandits under Weighted Information0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
Show:102550
← PrevPage 18 of 66Next →

No leaderboard results yet.