SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 351400 of 655 papers

TitleStatusHype
Thompson Sampling for Gaussian Entropic Risk Bandits0
High-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling0
When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution0
Blind Exploration and Exploitation of Stochastic Experts0
Challenges in Statistical Analysis of Data Collected by a Bandit Algorithm: An Empirical Exploration in Applications to Adaptively Randomized Experiments0
Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Automated Creative Optimization for E-Commerce AdvertisingCode0
Online Multi-Armed Bandits with Adaptive Inference0
Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models0
Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment0
The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling0
Meta-Thompson Sampling0
On the Suboptimality of Thompson Sampling in High DimensionsCode0
State-Aware Variational Thompson Sampling for Deep Q-NetworksCode0
Doubly robust Thompson sampling for linear payoffs0
Weak Signal Asymptotics for Sequentially Randomized Experiments0
Scalable Optimization for Wind Farm Control using Coordination GraphsCode0
TSEC: a framework for online experimentation under experimental constraints0
Deciding What to Learn: A Rate-Distortion Approach0
Etat de l'art sur l'application des bandits multi-bras0
Meta-Reinforcement Learning With Informed Policy Regularization0
Learning to Play Imperfect-Information Games by Imitating an Oracle PlannerCode0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
Reinforcement Learning with Subspaces using Free Energy Paradigm0
Distributed Thompson Sampling0
On Efficiency in Hierarchical Reinforcement Learning0
Non-Stationary Latent Bandits0
Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning0
Risk-Constrained Thompson Sampling for CVaR Bandits0
Reward Biased Maximum Likelihood Estimation for Reinforcement Learning0
Accelerating Grasp Exploration by Leveraging Learned Priors0
Multi-Agent Active Search using Realistic Depth-Aware Noise ModelCode0
Thompson sampling for linear quadratic mean-field teams0
Asymptotic Convergence of Thompson Sampling0
Adaptive Combinatorial Allocation0
Greedy k-Center from Noisy Distance Samples0
Multi-armed Bandits with Cost Subsidy0
Screening for an Infectious Disease as a Problem in Stochastic Control0
Bandit Policies for Reliable Cellular Network Handovers in Extreme Mobility0
Sub-sampling for Efficient Non-Parametric Bandit ExplorationCode0
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration0
Bayesian Algorithms for Decentralized Stochastic BanditsCode0
Reinforcement Learning for Efficient and Tuning-Free Link Adaptation0
Double-Linear Thompson Sampling for Context-Attentive Bandits0
Asynchronous ε-Greedy Bayesian OptimisationCode0
Online Learning and Distributed Control for Residential Demand Response0
Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization0
Stage-wise Conservative Linear Bandits0
Neural Model-based Optimization with Right-Censored Observations0
Show:102550
← PrevPage 8 of 14Next →

No leaderboard results yet.