SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 2650 of 655 papers

TitleStatusHype
Sparse Nonparametric Contextual Bandits0
Bandit-Based Prompt Design Strategy Selection Improves Prompt OptimizersCode0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces0
Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs0
When and why randomised exploration works (in linear bandits)0
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems0
Contextual Thompson Sampling via Generation of Missing Data0
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces0
Active RLHF via Best Policy Learning from Trajectory Preference Feedback0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
Stochastically Constrained Best Arm Identification with Thompson Sampling0
Truthful mechanisms for linear bandit games with private contexts0
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings0
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise0
Generalized Bayesian deep reinforcement learning0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Epinet for Content Cold Start0
Sample-Efficient Alignment for LLMsCode4
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem0
Show:102550
← PrevPage 2 of 27Next →

No leaderboard results yet.