SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 276300 of 655 papers

TitleStatusHype
Fourier Representations for Black-Box Optimization over Categorical Variables0
On learning Whittle index policy for restless bandits with scalable regret0
Bayesian Non-stationary Linear Bandits for Large-Scale Recommender SystemsCode0
Tsetlin Machine for Solving Contextual Bandit ProblemsCode0
Deep Hierarchy in Bandits0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
Modeling Human Exploration Through Resource-Rational Reinforcement LearningCode0
Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems0
IBAC: An Intelligent Dynamic Bandwidth Channel Access Avoiding Outside Warning Range Problem0
On Dynamic Pricing with Covariates0
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization0
Safe Linear Leveling Bandits0
Risk and optimal policies in bandit experiments0
Bayesian Optimization over Permutation SpacesCode1
Observation-Free Attacks on Stochastic Bandits0
Doubly Robust Thompson Sampling with Linear Payoffs0
Optimizing Conditional Value-At-Risk of Black-Box FunctionsCode0
Adaptive Gating for Single-Photon 3D Imaging0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Hierarchical Bayesian Bandits0
The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle0
Maillard Sampling: Boltzmann Exploration Done Optimally0
Online Learning of Energy Consumption for Navigation of Electric Vehicles0
Efficient Inference Without Trading-off Regret in Bandits: An Allocation Probability Test for Thompson Sampling0
Show:102550
← PrevPage 12 of 27Next →

No leaderboard results yet.