SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 150 of 655 papers

TitleStatusHype
Robust Policy Switching for Antifragile Reinforcement Learning for UAV Deconfliction in Adversarial Environments0
Context Attribution with Multi-Armed Bandit Optimization0
Adaptive Data Augmentation for Thompson Sampling0
Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?0
Efficient kernelized bandit algorithms via exploration distributions0
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling0
Thompson Sampling in Online RLHF with General Function Approximation0
Stable Thompson Sampling: Valid Inference via Variance Inflation0
Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection0
Representative Action Selection for Large Action-Space Meta-BanditsCode0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
Dynamic Decision-Making under Model Misspecification0
Addressing Missing Data Issue for Diffusion-based RecommendationCode0
Thompson Sampling-like Algorithms for Stochastic Rising Bandits0
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control0
Connecting Thompson Sampling and UCB: Towards More Efficient Trade-offs Between Privacy and Regret0
Bayesian learning of the optimal action-value function in a Markov decision process0
Neural Contextual Bandits Under Delayed Feedback Constraints0
Counterfactual Inference under Thompson Sampling0
Dynamic Assortment Selection and Pricing with Censored Preference FeedbackCode0
Sparse Nonparametric Contextual Bandits0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
When and why randomised exploration works (in linear bandits)0
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems0
Contextual Thompson Sampling via Generation of Missing Data0
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces0
Active RLHF via Best Policy Learning from Trajectory Preference Feedback0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
Stochastically Constrained Best Arm Identification with Thompson Sampling0
Truthful mechanisms for linear bandit games with private contexts0
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings0
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise0
Generalized Bayesian deep reinforcement learning0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Epinet for Content Cold Start0
Sample-Efficient Alignment for LLMsCode4
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem0
Show:102550
← PrevPage 1 of 14Next →

No leaderboard results yet.