SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 161170 of 655 papers

TitleStatusHype
Monte-Carlo tree search with uncertainty propagation via optimal transport0
Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits0
gym-saturation: Gymnasium environments for saturation provers (System description)0
Generalized Regret Analysis of Thompson Sampling using Fractional Posteriors0
Simple Modification of the Upper Confidence Bound Algorithm by Generalized Weighted AveragesCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit0
AdaptEx: A Self-Service Contextual Bandit Platform0
Bag of Policies for Distributional Deep Exploration0
VITS : Variational Inference Thompson Sampling for contextual banditsCode0
Show:102550
← PrevPage 17 of 66Next →

No leaderboard results yet.