SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 426450 of 655 papers

TitleStatusHype
Smart Routing with Precise Link Estimation: DSEE-Based Anypath Routing for Reliable Wireless Networking0
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling0
Sparse Nonparametric Contextual Bandits0
Sparse Spectrum Gaussian Process for Bayesian Optimization0
Speculative Decoding via Early-exiting for Faster LLM Inference with Thompson Sampling Control Mechanism0
SPRT-based Efficient Best Arm Identification in Stochastic Bandits0
Stable Thompson Sampling: Valid Inference via Variance Inflation0
Stage-wise Conservative Linear Bandits0
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits0
Stochastically Constrained Best Arm Identification with Thompson Sampling0
Stochastic Neural Network with Kronecker Flow0
Streaming kernel regression with provably adaptive mean, variance, and regularization0
Surrogate modeling for Bayesian optimization beyond a single Gaussian process0
Synthetically Controlled Bandits0
Taming Non-stationary Bandits: A Bayesian Approach0
Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits0
Cramming Contextual Bandits for On-policy Statistical Evaluation0
The Effect of Communication on Noncooperative Multiplayer Multi-Armed Bandit Problems0
The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits0
The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle0
The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation0
The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling0
The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies0
The Typical Behavior of Bandit Algorithms0
Thompson Exploration with Best Challenger Rule in Best Arm Identification0
Show:102550
← PrevPage 18 of 27Next →

No leaderboard results yet.