SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 151160 of 655 papers

TitleStatusHype
Little Exploration is All You Need0
qPOTS: Efficient batch multiobjective Bayesian optimization via Pareto optimal Thompson samplingCode1
Making RL with Preference-based Feedback Efficient via Randomization0
Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization0
Using Adaptive Bandit Experiments to Increase and Investigate Engagement in Mental HealthCode0
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingCode1
Optimal Exploration is no harder than Thompson Sampling0
Module-wise Adaptive Distillation for Multimodality Foundation Models0
Thompson Exploration with Best Challenger Rule in Best Arm Identification0
From Bandits Model to Deep Deterministic Policy Gradient, Reinforcement Learning with Contextual Information0
Show:102550
← PrevPage 16 of 66Next →

No leaderboard results yet.