SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 481490 of 655 papers

TitleStatusHype
Thompson Sampling for the MNL-Bandit0
Thompson Sampling for Unimodal Bandits0
Thompson Sampling for Unsupervised Sequential Selection0
Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study0
Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems0
Thompson Sampling in Dynamic Systems for Contextual Bandit Problems0
Thompson Sampling in Non-Episodic Restless Bandits0
Thompson Sampling in Online RLHF with General Function Approximation0
Thompson Sampling in Partially Observable Contextual Bandits0
Thompson Sampling is Asymptotically Optimal in General Environments0
Show:102550
← PrevPage 49 of 66Next →

No leaderboard results yet.