SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 3140 of 655 papers

TitleStatusHype
Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?0
Efficient kernelized bandit algorithms via exploration distributions0
Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget0
Simplifying Bayesian Optimization Via In-Context Direct Optimum Sampling0
Stable Thompson Sampling: Valid Inference via Variance Inflation0
Thompson Sampling in Online RLHF with General Function Approximation0
Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection0
Representative Action Selection for Large Action-Space Meta-BanditsCode0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions0
Show:102550
← PrevPage 4 of 66Next →

No leaderboard results yet.