SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 110 of 655 papers

TitleStatusHype
Sample-Efficient Alignment for LLMsCode4
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
Optimizing Posterior Samples for Bayesian Optimization via RootfindingCode1
Batched Bayesian optimization by maximizing the probability of including the optimumCode1
A Bayesian Approach to Online PlanningCode1
Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood SearchCode1
qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson samplingCode1
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingCode1
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloCode1
Show:102550
← PrevPage 1 of 66Next →

No leaderboard results yet.