SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 5175 of 655 papers

TitleStatusHype
Counterfactual Inference under Thompson Sampling0
Sparse Nonparametric Contextual Bandits0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
When and why randomised exploration works (in linear bandits)0
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems0
Contextual Thompson Sampling via Generation of Missing Data0
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
Active RLHF via Best Policy Learning from Trajectory Preference Feedback0
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings0
Stochastically Constrained Best Arm Identification with Thompson Sampling0
Truthful mechanisms for linear bandit games with private contexts0
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise0
Generalized Bayesian deep reinforcement learning0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Epinet for Content Cold Start0
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem0
BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration0
Show:102550
← PrevPage 3 of 27Next →

No leaderboard results yet.