SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 171180 of 655 papers

TitleStatusHype
Approximate information for efficient exploration-exploitation strategies0
Thompson Sampling under Bernoulli Rewards with Local Differential Privacy0
Thompson sampling for improved exploration in GFlowNets0
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits0
Scalable Neural Contextual Bandit for Recommender Systems0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space0
Incentivizing Exploration with Linear Contexts and Combinatorial Actions0
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive AdvantagesCode0
Combinatorial Neural Bandits0
Show:102550
← PrevPage 18 of 66Next →

No leaderboard results yet.