SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 351360 of 655 papers

TitleStatusHype
Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models0
Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment0
The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling0
Meta-Thompson Sampling0
On the Suboptimality of Thompson Sampling in High DimensionsCode0
State-Aware Variational Thompson Sampling for Deep Q-NetworksCode0
Doubly robust Thompson sampling for linear payoffs0
Weak Signal Asymptotics for Sequentially Randomized Experiments0
An empirical evaluation of active inference in multi-armed banditsCode1
Scalable Optimization for Wind Farm Control using Coordination GraphsCode0
Show:102550
← PrevPage 36 of 66Next →

No leaderboard results yet.