SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 421430 of 655 papers

TitleStatusHype
Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling0
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms0
Simple Bayesian Algorithms for Best Arm Identification0
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling0
Sliding-Window Thompson Sampling for Non-Stationary Settings0
Smart Routing with Precise Link Estimation: DSEE-Based Anypath Routing for Reliable Wireless Networking0
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling0
Sparse Nonparametric Contextual Bandits0
Sparse Spectrum Gaussian Process for Bayesian Optimization0
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism0
Show:102550
← PrevPage 43 of 66Next →

No leaderboard results yet.