SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 125 of 655 papers

TitleStatusHype
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments0
Context Attribution with Multi-Armed Bandit Optimization0
Adaptive Data Augmentation for Thompson Sampling0
Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?0
Efficient kernelized bandit algorithms via exploration distributions0
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
Thompson Sampling in Online RLHF with General Function Approximation0
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling0
Stable Thompson Sampling: Valid Inference via Variance Inflation0
Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection0
Representative Action Selection for Large Action-Space Meta-BanditsCode0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
Dynamic Decision-Making under Model Misspecification0
Addressing Missing Data Issue for Diffusion-based RecommendationCode0
Thompson Sampling-like Algorithms for Stochastic Rising Bandits0
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control0
Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret0
Bayesian learning of the optimal action-value function in a Markov decision process0
Neural Contextual Bandits Under Delayed Feedback Constraints0
Counterfactual Inference under Thompson Sampling0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Show:102550
← PrevPage 1 of 27Next →

No leaderboard results yet.