SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 551600 of 655 papers

TitleStatusHype
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Thompson Sampling for Dynamic Pricing0
Information Directed Sampling and Bandits with Heteroscedastic Noise0
Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling0
On Adaptive Estimation for Dynamic Bernoulli Bandits0
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds0
Efficient exploration with Double Uncertain Value Networks0
Customized Nonlinear Bandits for Online Response Selection in Neural Conversation Models0
Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies0
BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems0
Estimating prediction error for complex samples0
Efficient-UCBV: An Almost Optimal Algorithm using Variance Estimates0
Information Directed Sampling for Stochastic Bandits with Graph Feedback0
The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems0
Generalized Probabilistic Bisection for Stochastic Root-Finding0
Minimal Exploration in Structured Stochastic Bandits0
Sequential Matrix Completion0
A study of Thompson Sampling with Parameter h0
Learning Unknown Markov Decision Processes: A Thompson Sampling Approach0
Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits0
Bayesian bandits: balancing the exploration-exploitation tradeoff via double samplingCode0
Variational inference for the multi-armed contextual banditCode0
Learning to Price with Reference Effects0
Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors0
Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems0
Reinforcement learning techniques for Outer Loop Link Adaptation in 4G/5G systems0
Streaming kernel regression with provably adaptive mean, variance, and regularization0
Counterfactual Data-Fusion for Online Reinforcement Learners0
Taming Non-stationary Bandits: A Bayesian Approach0
Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret0
Calibrated Fairness in Bandits0
A Practical Method for Solving Contextual Bandit Problems Using Decision Trees0
Bandit Models of Human Behavior: Reward Processing in Mental Disorders0
Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space0
Thompson Sampling for the MNL-Bandit0
Scalable Generalized Linear Bandits: Online Computation and Hashing0
Asynchronous Parallel Bayesian Optimisation via Thompson SamplingCode0
A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data0
AIXIjs: A Software Demo for General Reinforcement LearningCode0
Ensemble Sampling0
Posterior sampling for reinforcement learning: worst-case regret bounds0
Adaptive Rate of Convergence of Thompson Sampling for Gaussian Process Optimization0
Context Attentive Bandits: Contextual Bandit with Restricted Context0
Multi-dueling Bandits with Dependent Arms0
Mostly Exploration-Free Algorithms for Contextual BanditsCode0
Time-Sensitive Bandit Learning and Satisficing Thompson Sampling0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Thompson Sampling for Linear-Quadratic Control Problems0
Horde of Bandits using Gaussian Markov Random Fields0
QoS-Aware Multi-Armed Bandits0
Show:102550
← PrevPage 12 of 14Next →

No leaderboard results yet.