SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 221230 of 655 papers

TitleStatusHype
Asymptotically Optimal Bandits under Weighted Information0
A General Theory of the Stochastic Linear Bandit and Its Applications0
Effects of Model Misspecification on Bayesian Bandits: Case Studies in UX Optimization0
Efficient and Adaptive Posterior Sampling Algorithms for Bandits0
Efficient Benchmarking of NLP APIs using Multi-armed Bandits0
Efficient Exploration for LLMs0
Efficient exploration of zero-sum stochastic games0
Cost-efficient Knowledge-based Question Answering with Large Language Models0
Efficient exploration with Double Uncertain Value Networks0
Cost Aware Asynchronous Multi-Agent Active Search0
Show:102550
← PrevPage 23 of 66Next →

No leaderboard results yet.