SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 251300 of 655 papers

TitleStatusHype
From Predictions to Decisions: The Importance of Joint Predictive Distributions0
Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems0
Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space0
Expected Improvement-based Contextual Bandits0
Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation0
An Information-Theoretic Analysis of Thompson Sampling0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Feel-Good Thompson Sampling for Contextual Bandits and Reinforcement Learning0
An Information-Theoretic Analysis for Thompson Sampling with Many Actions0
Adaptively Learning to Select-Rank in Online Platforms0
Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization0
Bayesian Optimization-Based Beam Alignment for MmWave MIMO Communication Systems0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
Feel-Good Thompson Sampling for Contextual Dueling Bandits0
Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?0
Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits0
First-Order Bayesian Regret Analysis of Thompson Sampling0
Fixed-Confidence Guarantees for Bayesian Best-Arm Identification0
Fourier Representations for Black-Box Optimization over Categorical Variables0
Freshness-Aware Thompson Sampling0
From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information0
Fully Distributed Bayesian Optimization with Stochastic Policies0
Gaussian Process Thompson Sampling via Rootfinding0
Generalized Bayesian deep reinforcement learning0
Generalized Probabilistic Bisection for Stochastic Root-Finding0
Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors0
Generalized Thompson Sampling for Contextual Bandits0
Best Arm Identification in Batched Multi-armed Bandit Problems0
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions0
Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits0
Graph Neural Thompson Sampling0
Feedback graph regret bounds for Thompson Sampling and UCB0
Greedy Bandits with Sampled Context0
Greedy k-Center from Noisy Distance Samples0
GuideBoot: Guided Bootstrap for Deep Contextual Bandits0
GUTS: Generalized Uncertainty-Aware Thompson Sampling for Multi-Agent Active Search0
gym-saturation: Gymnasium environments for saturation provers (System description)0
Hierarchical Bayesian Bandits0
High-dimensional near-optimal experiment design for drug discovery via Bayesian sparse sampling0
Horde of Bandits using Gaussian Markov Random Fields0
Human collective intelligence as distributed Bayesian inference0
Hypermodels for Exploration0
IBAC: An Intelligent Dynamic Bandwidth Channel Access Avoiding Outside Warning Range Problem0
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning0
Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems0
Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration0
Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems0
Fast online inference for nonlinear contextual bandit based on Generative Adversarial Network0
Show:102550
← PrevPage 6 of 14Next →

No leaderboard results yet.