SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 125 of 655 papers

TitleStatusHype
Sample-Efficient Alignment for LLMsCode4
Seamlessly Unifying Attributes and Items: Conversational Recommendation for Cold-Start UsersCode1
Approximate Thompson Sampling via Epistemic Neural NetworksCode1
Meta-Learning Stationary Stochastic Process Prediction with Convolutional Neural ProcessesCode1
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte CarloCode1
Sample-Then-Optimize Batch Neural Thompson SamplingCode1
Deep Bandits Show-Off: Simple and Efficient Exploration with Deep NetworksCode1
Bayesian Optimization over Permutation SpacesCode1
Federated Bayesian Optimization via Thompson SamplingCode1
Mercer Features for Efficient Combinatorial Bayesian OptimizationCode1
On Isometry Robustness of Deep 3D Point Cloud Models under Adversarial AttacksCode1
Optimizing Posterior Samples for Bayesian Optimization via RootfindingCode1
qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson samplingCode1
Neural Exploitation and Exploration of Contextual BanditsCode1
A Bayesian Approach to Online PlanningCode1
An empirical evaluation of active inference in multi-armed banditsCode1
A Tutorial on Thompson SamplingCode1
Batched Bayesian optimization by maximizing the probability of including the optimumCode1
Dynamic Slate Recommendation with Gated Recurrent Units and Thompson SamplingCode1
EE-Net: Exploitation-Exploration Neural Networks in Contextual BanditsCode1
Langevin Monte Carlo for Contextual BanditsCode1
Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic LearningCode1
Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood SearchCode1
Neural Thompson SamplingCode1
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
Show:102550
← PrevPage 1 of 27Next →

No leaderboard results yet.