SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 571580 of 655 papers

TitleStatusHype
An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits0
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling0
An Empirical Evaluation of Thompson Sampling0
An Extremely Data-efficient and Generative LLM-based Reinforcement Learning Agent for Recommenders0
An improved regret analysis for UCB-N and TS-N0
An Information-Theoretic Analysis for Thompson Sampling with Many Actions0
An Information-Theoretic Analysis of Thompson Sampling0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces0
An Online Learning Framework for Energy-Efficient Navigation of Electric Vehicles0
Show:102550
← PrevPage 58 of 66Next →

No leaderboard results yet.