SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 411420 of 655 papers

TitleStatusHype
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits0
Linear Bandit algorithms using the Bootstrap0
Linear Thompson Sampling Revisited0
Little Exploration is All You Need0
Maillard Sampling: Boltzmann Exploration Done Optimally0
Making RL with Preference-based Feedback Efficient via Randomization0
Making Sense of Reinforcement Learning and Probabilistic Inference0
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow0
Optimization-Driven Adaptive Experimentation0
Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents0
Show:102550
← PrevPage 42 of 66Next →

No leaderboard results yet.