SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 1120 of 655 papers

TitleStatusHype
Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood SearchCode1
Mercer Features for Efficient Combinatorial Bayesian OptimizationCode1
Neural Exploitation and Exploration of Contextual BanditsCode1
Neural Thompson SamplingCode1
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep NetworksCode1
An empirical evaluation of active inference in multi-armed banditsCode1
A Tutorial on Thompson SamplingCode1
Batched Bayesian optimization by maximizing the probability of including the optimumCode1
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson SamplingCode1
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
Show:102550
← PrevPage 2 of 66Next →

No leaderboard results yet.