SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 561570 of 655 papers

TitleStatusHype
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Online Learning of Decision Trees with Thompson SamplingCode0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Vaccine allocation policy optimization and budget sharing mechanism using Thompson samplingCode0
Bayesian Algorithms for Decentralized Stochastic BanditsCode0
FedRTS: Federated Robust Pruning via Combinatorial Thompson SamplingCode0
Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop PlanningCode0
State-Aware Variational Thompson Sampling for Deep Q-NetworksCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Show:102550
← PrevPage 57 of 66Next →

No leaderboard results yet.