SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 411420 of 655 papers

TitleStatusHype
Reinforcement Learning with Trajectory Feedback0
Lenient Regret for Multi-Armed Bandits0
IntelligentPooling: Practical Thompson Sampling for mHealth0
Greedy Bandits with Sampled Context0
Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems0
Variable Selection via Thompson Sampling0
Policy Gradient Optimization of Thompson Sampling Policies0
Asynchronous Multi Agent Active Search0
Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect0
Constrained Thompson Sampling for Real-Time Electricity Pricing with Grid Reliability Constraints0
Show:102550
← PrevPage 42 of 66Next →

No leaderboard results yet.