SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 221230 of 655 papers

TitleStatusHype
An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits0
Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce0
A Copula approach for hyperparameter transfer learning0
Efficient and Adaptive Posterior Sampling Algorithms for Bandits0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Efficient Exploration for LLMs0
Efficient exploration of zero-sum stochastic games0
Bandits Under The Influence (Extended Version)0
Efficient exploration with Double Uncertain Value Networks0
Bayesian Optimization with LLM-Based Acquisition Functions for Natural Language Preference Elicitation0
Show:102550
← PrevPage 23 of 66Next →

No leaderboard results yet.