SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 491500 of 655 papers

TitleStatusHype
Stochastic Neural Network with Kronecker Flow0
The Intrinsic Robustness of Stochastic Bandits to Strategic Manipulation0
Regret Bounds for Thompson Sampling in Episodic Restless Bandit ProblemsCode0
Connections Between Mirror Descent, Thompson Sampling and the Information Ratio0
Feedback graph regret bounds for Thompson Sampling and UCB0
Adaptive Model Selection Framework: An Application to Airline Pricing0
Adaptive Sensor Placement for Continuous Spaces0
On the Performance of Thompson Sampling on Logistic Bandits0
Memory Bounded Open-Loop Planning in Large POMDPs using Thompson SamplingCode0
AutoSeM: Automatic Task Selection and Mixing in Multi-Task Learning0
Show:102550
← PrevPage 50 of 66Next →

No leaderboard results yet.