SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 401450 of 655 papers

TitleStatusHype
KLUCB Approach to Copeland Bandits0
Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Latent Bandits Revisited0
Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect0
Sample Efficient Learning of Factored Embeddings of Tensor Fields0
Learning How to Infer Partial MDPs for In-Context Adaptation and Exploration0
Learning to Optimize Via Posterior Sampling0
Learning to Price with Reference Effects0
Learning to Rank in the Position Based Model with Bandit Feedback0
Learning Unknown Markov Decision Processes: A Thompson Sampling Approach0
Lenient Regret for Multi-Armed Bandits0
Leveraging Demonstrations to Improve Online Learning: Quality Matters0
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control0
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits0
Linear Bandit algorithms using the Bootstrap0
Linear Thompson Sampling Revisited0
Little Exploration is All You Need0
Maillard Sampling: Boltzmann Exploration Done Optimally0
Making RL with Preference-based Feedback Efficient via Randomization0
Making Sense of Reinforcement Learning and Probabilistic Inference0
Markov Decision Process modeled with Bandits for Sequential Decision Making in Linear-flow0
Optimization-Driven Adaptive Experimentation0
Memory Sequence Length of Data Sampling Impacts the Adaptation of Meta-Reinforcement Learning Agents0
Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models0
Meta Dynamic Pricing: Transfer Learning Across Experiments0
Meta Learning in Bandits within Shared Affine Subspaces0
Metalearning Linear Bandits by Prior Update0
Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks0
Meta-Reinforcement Learning With Informed Policy Regularization0
Meta-Thompson Sampling0
Minimal Exploration in Structured Stochastic Bandits0
TS-RSR: A provably efficient approach for batch Bayesian Optimization0
Mixed-Variable Bayesian Optimization0
Model-based Meta Reinforcement Learning using Graph Structured Surrogate Models0
Model-Free Approximate Bayesian Learning for Large-Scale Conversion Funnel Optimization0
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis0
Module-wise Adaptive Distillation for Multimodality Foundation Models0
Monte Carlo Tree Search Algorithms for Risk-Aware and Multi-Objective Reinforcement Learning0
Monte-Carlo tree search with uncertainty propagation via optimal transport0
MOTS: Minimax Optimal Thompson Sampling0
Multi-Agent Active Search using Detection and Location Uncertainty0
Multi-armed Bandit Algorithms on System-on-Chip: Go Frequentist or Bayesian?0
Multi-Armed Bandit Strategies for Non-Stationary Reward Distributions and Delayed Feedback Processes0
Multi-dueling Bandits with Dependent Arms0
Multi-Task Combinatorial Bandits for Budget Allocation0
Near Optimal Adversarial Attacks on Stochastic Bandits and Defenses with Smoothed Responses0
Neural Contextual Bandits Under Delayed Feedback Constraints0
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback0
Neural Model-based Optimization with Right-Censored Observations0
Show:102550
← PrevPage 9 of 14Next →

No leaderboard results yet.