SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 351375 of 655 papers

TitleStatusHype
Thompson Sampling for Gaussian Entropic Risk Bandits0
High-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling0
When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution0
Blind Exploration and Exploitation of Stochastic Experts0
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments0
Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Online Multi-Armed Bandits with Adaptive Inference0
Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models0
Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment0
The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling0
Meta-Thompson Sampling0
On the Suboptimality of Thompson Sampling in High DimensionsCode0
State-Aware Variational Thompson Sampling for Deep Q-NetworksCode0
Doubly robust Thompson sampling for linear payoffs0
Weak Signal Asymptotics for Sequentially Randomized Experiments0
Scalable Optimization for Wind Farm Control using Coordination GraphsCode0
TSEC: a framework for online experimentation under experimental constraints0
Deciding What to Learn: A Rate-Distortion Approach0
Etat de l'art sur l'application des bandits multi-bras0
Meta-Reinforcement Learning With Informed Policy Regularization0
Learning to Play Imperfect-Information Games by Imitating an Oracle PlannerCode0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Reinforcement Learning with Subspaces using Free Energy Paradigm0
Show:102550
← PrevPage 15 of 27Next →

No leaderboard results yet.