SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 331340 of 655 papers

TitleStatusHype
Online Algorithms For Parameter Mean And Variance Estimation In Dynamic Regression Models0
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits0
Online Causal Inference for Advertising in Real-Time Bidding Auctions0
Online Learning and Distributed Control for Residential Demand Response0
Online Learning-based Waveform Selection for Improved Vehicle Recognition in Automotive Radar0
Online Learning of Energy Consumption for Navigation of Electric Vehicles0
Online Learning of Network Bottlenecks via Minimax Paths0
Online Residential Demand Response via Contextual Multi-Armed Bandits0
Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling0
On Multi-Armed Bandit Designs for Dose-Finding Clinical Trials0
Show:102550
← PrevPage 34 of 66Next →

No leaderboard results yet.