SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 521530 of 655 papers

TitleStatusHype
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
A Unified Approach to Translate Classical Bandit Algorithms to the Structured Bandit Setting0
Combining Bayesian Optimization and Lipschitz Optimization0
Contextual Multi-Armed Bandits for Causal Marketing0
Thompson Sampling Algorithms for Cascading Bandits0
Efficient Linear Bandits through Matrix Sketching0
Incorporating Behavioral Constraints in Online AI Systems0
Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms0
Adaptive Grey-Box Fuzz-Testing with Thompson Sampling0
Show:102550
← PrevPage 53 of 66Next →

No leaderboard results yet.