SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 150 of 655 papers

TitleStatusHype
Sample-Efficient Alignment for LLMsCode4
Batched Bayesian optimization by maximizing the probability of including the optimumCode1
Approximate Thompson Sampling via Epistemic Neural NetworksCode1
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
EE-Net: Exploitation-Exploration Neural Networks in Contextual BanditsCode1
An empirical evaluation of active inference in multi-armed banditsCode1
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep NetworksCode1
Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural ProcessesCode1
Sample-Then-Optimize Batch Neural Thompson SamplingCode1
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingCode1
On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial AttacksCode1
Langevin Monte Carlo for Contextual BanditsCode1
A Tutorial on Thompson SamplingCode1
A Bayesian Approach to Online PlanningCode1
Neural Exploitation and Exploration of Contextual BanditsCode1
Bayesian Optimization over Permutation SpacesCode1
Federated Bayesian Optimization via Thompson SamplingCode1
Mercer Features for Efficient Combinatorial Bayesian OptimizationCode1
Optimizing Posterior Samples for Bayesian Optimization via RootfindingCode1
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloCode1
Optimal Thompson Sampling strategies for support-aware CVaR banditsCode1
Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood SearchCode1
qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson samplingCode1
Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-Start UsersCode1
Neural Thompson SamplingCode1
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson SamplingCode1
Adaptive Grey-Box Fuzz-Testing with Thompson Sampling0
Adaptive Gating for Single-Photon 3D Imaging0
A Combinatorial Semi-Bandit Approach to Charging Station Selection for Electric Vehicles0
A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms0
Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
Analyzing and Enhancing Queue Sampling for Energy-Efficient Remote Control of Bandits0
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches0
Adaptive Data Augmentation for Thompson Sampling0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
Adaptive Combinatorial Allocation0
A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits0
A Batched Multi-Armed Bandit Approach to News Headline Testing0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
An Analysis of Ensemble Sampling0
Aging Bandits: Regret Analysis and Order-Optimal Learning Algorithm for Wireless Networks with Stochastic Arrivals0
A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms0
Accelerating Grasp Exploration by Leveraging Learned Priors0
A General Theory of the Stochastic Linear Bandit and Its Applications0
A Formal Solution to the Grain of Truth Problem0
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization0
Aligning AI Agents via Information-Directed Sampling0
AdaptEx: A Self-Service Contextual Bandit Platform0
Show:102550
← PrevPage 1 of 14Next →

No leaderboard results yet.