SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 151200 of 655 papers

TitleStatusHype
Little Exploration is All You Need0
qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson samplingCode1
Making RL with Preference-based Feedback Efficient via Randomization0
Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization0
Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental HealthCode0
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingCode1
Optimal Exploration is no harder than Thompson Sampling0
Module-wise Adaptive Distillation for Multimodality Foundation Models0
Thompson Exploration with Best Challenger Rule in Best Arm Identification0
From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information0
Monte-Carlo tree search with uncertainty propagation via optimal transport0
Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits0
gym-saturation: Gymnasium environments for saturation provers (System description)0
Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors0
Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted AveragesCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit0
AdaptEx: A Self-Service Contextual Bandit Platform0
Bag of Policies for Distributional Deep Exploration0
VITS : Variational Inference Thompson Sampling for contextual banditsCode0
Approximate information for efficient exploration-exploitation strategies0
Thompson Sampling under Bernoulli Rewards with Local Differential Privacy0
Thompson sampling for improved exploration in GFlowNets0
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits0
Scalable Neural Contextual Bandit for Recommender Systems0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space0
Incentivizing Exploration with Linear Contexts and Combinatorial Actions0
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive AdvantagesCode0
Combinatorial Neural Bandits0
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloCode1
Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation0
Discounted Thompson Sampling for Non-Stationary Bandit Problems0
Sequential Best-Arm Identification with Application to Brain-Computer Interface0
Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions0
An improved regret analysis for UCB-N and TS-N0
Trajectory-oriented optimization of stochastic epidemiological modelsCode0
Neural Exploitation and Exploration of Contextual BanditsCode1
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards0
Efficiently Tackling Million-Dimensional Multiobjective Problems: A Direction Sampling and Fine-Tuning Approach0
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms0
GUTS: Generalized Uncertainty-Aware Thompson Sampling for Multi-Agent Active Search0
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches0
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
A Unified and Efficient Coordinating Framework for Autonomous DBMS Tuning0
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms0
Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors0
The Choice of Noninformative Priors for Thompson Sampling in Multiparameter Bandit Models0
When Combinatorial Thompson Sampling meets Approximation Regret0
Show:102550
← PrevPage 4 of 14Next →

No leaderboard results yet.