SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 150 of 655 papers

TitleStatusHype
Sample-Efficient Alignment for LLMsCode4
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
Optimizing Posterior Samples for Bayesian Optimization via RootfindingCode1
Batched Bayesian optimization by maximizing the probability of including the optimumCode1
A Bayesian Approach to Online PlanningCode1
Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood SearchCode1
qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson samplingCode1
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingCode1
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloCode1
Neural Exploitation and Exploration of Contextual BanditsCode1
Approximate Thompson Sampling via Epistemic Neural NetworksCode1
Sample-Then-Optimize Batch Neural Thompson SamplingCode1
Langevin Monte Carlo for Contextual BanditsCode1
Bayesian Optimization over Permutation SpacesCode1
EE-Net: Exploitation-Exploration Neural Networks in Contextual BanditsCode1
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep NetworksCode1
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson SamplingCode1
An empirical evaluation of active inference in multi-armed banditsCode1
Mercer Features for Efficient Combinatorial Bayesian OptimizationCode1
Optimal Thompson Sampling strategies for support-aware CVaR banditsCode1
Federated Bayesian Optimization via Thompson SamplingCode1
Neural Thompson SamplingCode1
Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural ProcessesCode1
Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-Start UsersCode1
On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial AttacksCode1
A Tutorial on Thompson SamplingCode1
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments0
Context Attribution with Multi-Armed Bandit Optimization0
Adaptive Data Augmentation for Thompson Sampling0
Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?0
Efficient kernelized bandit algorithms via exploration distributions0
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
Stable Thompson Sampling: Valid Inference via Variance Inflation0
Thompson Sampling in Online RLHF with General Function Approximation0
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling0
Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection0
Representative Action Selection for Large Action-Space Meta-BanditsCode0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Dynamic Decision-Making under Model Misspecification0
Addressing Missing Data Issue for Diffusion-based RecommendationCode0
Thompson Sampling-like Algorithms for Stochastic Rising Bandits0
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control0
Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret0
Bayesian learning of the optimal action-value function in a Markov decision process0
Neural Contextual Bandits Under Delayed Feedback Constraints0
Counterfactual Inference under Thompson Sampling0
Show:102550
← PrevPage 1 of 14Next →

No leaderboard results yet.