SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 641650 of 655 papers

TitleStatusHype
Thompson Sampling with Information Relaxation PenaltiesCode0
Efficient Optimal Selection for Composited Advertising Creatives with Tree StructureCode0
Odds-Ratio Thompson Sampling to Control for Time-Varying EffectCode0
Old Dog Learns New Tricks: Randomized UCB for Bandit ProblemsCode0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Trajectory-oriented optimization of stochastic epidemiological modelsCode0
On Bits and Bandits: Quantifying the Regret-Information Trade-offCode0
Learning to Play Imperfect-Information Games by Imitating an Oracle PlannerCode0
Process-constrained batch Bayesian approaches for yield optimization in multi-reactor systemsCode0
ESCADA: Efficient Safety and Context Aware Dose Allocation for Precision MedicineCode0
Show:102550
← PrevPage 65 of 66Next →

No leaderboard results yet.