SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 521530 of 655 papers

TitleStatusHype
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Thompson Sampling for Pursuit-Evasion Problems0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting0
Combining Bayesian Optimization and Lipschitz Optimization0
Thompson Sampling Algorithms for Cascading Bandits0
Contextual Multi-Armed Bandits for Causal Marketing0
Efficient Linear Bandits through Matrix Sketching0
Incorporating Behavioral Constraints in Online AI Systems0
Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms0
Show:102550
← PrevPage 53 of 66Next →

No leaderboard results yet.