SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 150 of 655 papers

TitleStatusHype
Sample-Efficient Alignment for LLMsCode4
Langevin Monte Carlo for Contextual BanditsCode1
Federated Bayesian Optimization via Thompson SamplingCode1
Optimal Thompson Sampling strategies for support-aware CVaR banditsCode1
A Tutorial on Thompson SamplingCode1
An empirical evaluation of active inference in multi-armed banditsCode1
Mercer Features for Efficient Combinatorial Bayesian OptimizationCode1
Optimizing Posterior Samples for Bayesian Optimization via RootfindingCode1
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingCode1
Bayesian Optimization over Permutation SpacesCode1
Neural Exploitation and Exploration of Contextual BanditsCode1
Approximate Thompson Sampling via Epistemic Neural NetworksCode1
A Bayesian Approach to Online PlanningCode1
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloCode1
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural ProcessesCode1
On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial AttacksCode1
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep NetworksCode1
Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-Start UsersCode1
EE-Net: Exploitation-Exploration Neural Networks in Contextual BanditsCode1
Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood SearchCode1
Sample-Then-Optimize Batch Neural Thompson SamplingCode1
qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson samplingCode1
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson SamplingCode1
Neural Thompson SamplingCode1
Batched Bayesian optimization by maximizing the probability of including the optimumCode1
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
Scalable Exploration via Ensemble++Code0
Efficient Exploration through Bayesian Deep Q-NetworksCode0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Double Thompson Sampling for Dueling BanditsCode0
Distributed Thompson sampling under constrained communicationCode0
AIXIjs: A Software Demo for General Reinforcement LearningCode0
Differentially Private Online Bayesian Estimation With Adaptive TruncationCode0
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences ConstraintsCode0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Causal Bandits for Linear Structural Equation ModelsCode0
Process-constrained batch Bayesian approaches for yield optimization in multi-reactor systemsCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Anytime Multi-Agent Path Finding with an Adaptive Delay-Based HeuristicCode0
Bayesian Optimization for Categorical and Category-Specific Continuous InputsCode0
Show:102550
← PrevPage 1 of 14Next →

No leaderboard results yet.