SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 451500 of 655 papers

TitleStatusHype
Making Sense of Reinforcement Learning and Probabilistic Inference0
Randomized Exploration for Non-Stationary Stochastic Linear BanditsCode0
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling0
Ordinal Bayesian Optimisation0
Thompson Sampling and Approximate Inference0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
Automatic Ensemble Learning for Online Influence Maximization0
Multi-Agent Thompson Sampling for Bandit Applications with Sparse Neighbourhood StructuresCode0
Information-Theoretic Confidence Bounds for Reinforcement Learning0
Adaptive Portfolio by Solving Multi-armed Bandit via Thompson Sampling0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Safe Linear Thompson Sampling with Side Information0
On Online Learning in Kernelized Markov Decision Processes0
On Batch Bayesian Optimization0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson Sampling via Local UncertaintyCode0
Fixed-Confidence Guarantees for Bayesian Best-Arm Identification0
Thompson Sampling in Non-Episodic Restless Bandits0
Regret Analysis of Bandit Problems with Causal Background Knowledge0
Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsCode0
Robust Dynamic Assortment Optimization in the Presence of Outlier Customers0
A Quantile-based Approach for Hyperparameter Transfer Learning0
A Copula approach for hyperparameter transfer learning0
Efficient Multivariate Bandit Algorithm with Path Planning0
An Arm-Wise Randomization Approach to Combinatorial Linear Semi-Bandits0
Online Causal Inference for Advertising in Real-Time Bidding Auctions0
A Batched Multi-Armed Bandit Approach to News Headline Testing0
A Bayesian Choice Model for Eliminating Feedback Loops0
Thompson Sampling with Approximate Inference0
Scaling Multi-Armed Bandit Algorithms0
Convergence Rates of Posterior Distributions in Markov Decision Process0
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop PlanningCode0
Thompson Sampling on Symmetric α-Stable Bandits0
Thompson Sampling for Combinatorial Network Optimization in Unknown Environments0
Mixed-Variable Bayesian Optimization0
Bandit Learning for Diversified Interactive Recommendation0
Thompson Sampling for Adversarial Bit Prediction0
Revised Progressive-Hedging-Algorithm Based Two-layer Solution Scheme for Bayesian Reinforcement Learning0
Sparse Spectrum Gaussian Process for Bayesian Optimization0
Stochastic Neural Network with Kronecker Flow0
The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation0
Regret Bounds for Thompson Sampling in Episodic Restless Bandit ProblemsCode0
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio0
Feedback graph regret bounds for Thompson Sampling and UCB0
Adaptive Model Selection Framework: An Application to Airline Pricing0
Adaptive Sensor Placement for Continuous Spaces0
On the Performance of Thompson Sampling on Logistic Bandits0
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson SamplingCode0
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning0
Show:102550
← PrevPage 10 of 14Next →

No leaderboard results yet.