SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 4150 of 655 papers

TitleStatusHype
WAPTS: A Weighted Allocation Probability Adjusted Thompson Sampling Algorithm for High-Dimensional and Sparse Experiment Settings0
On Improved Regret Bounds In Bayesian Optimization with Gaussian Noise0
Generalized Bayesian deep reinforcement learning0
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings0
Fast, Precise Thompson Sampling for Bayesian OptimizationCode0
Epinet for Content Cold Start0
Sample-Efficient Alignment for LLMsCode4
Minimum Empirical Divergence for Sub-Gaussian Linear BanditsCode0
Planning and Learning in Risk-Aware Restless Multi-Arm Bandit Problem0
Show:102550
← PrevPage 5 of 66Next →

No leaderboard results yet.