SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 3140 of 655 papers

TitleStatusHype
When and why randomised exploration works (in linear bandits)0
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems0
Contextual Thompson Sampling via Generation of Missing Data0
An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces0
Active RLHF via Best Policy Learning from Trajectory Preference Feedback0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
EVaDE : Event-Based Variational Thompson Sampling for Model-Based Reinforcement Learning0
Truthful mechanisms for linear bandit games with private contexts0
Stochastically Constrained Best Arm Identification with Thompson Sampling0
Show:102550
← PrevPage 4 of 66Next →

No leaderboard results yet.