SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 341350 of 655 papers

TitleStatusHype
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep NetworksCode1
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson SamplingCode1
High-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling0
When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution0
Blind Exploration and Exploitation of Stochastic Experts0
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments0
Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Online Multi-Armed Bandits with Adaptive Inference0
Show:102550
← PrevPage 35 of 66Next →

No leaderboard results yet.