SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 601610 of 655 papers

TitleStatusHype
Stacked Thompson BanditsCode0
Thompson Sampling For Stochastic Bandits with Graph Feedback0
Estimating Quality in Multi-Objective Bandits Optimization0
Exploration for Multi-task Reinforcement Learning with Deep Generative Models0
Nonparametric General Reinforcement Learning0
Linear Thompson Sampling Revisited0
Unimodal Thompson Sampling for Graph-Structured Arms0
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits0
A Formal Solution to the Grain of Truth Problem0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Show:102550
← PrevPage 61 of 66Next →

No leaderboard results yet.