SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 1120 of 655 papers

TitleStatusHype
Representative Action Selection for Large Action-Space Meta-BanditsCode0
Deconfounded Warm-Start Thompson Sampling with Applications to Precision Medicine0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Generator-Mediated Bandits: Thompson Sampling for GenAI-Powered Adaptive Interventions0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Steering Generative Models with Experimental Data for Protein Fitness OptimizationCode1
Dynamic Decision-Making under Model Misspecification0
Addressing Missing Data Issue for Diffusion-based RecommendationCode0
Thompson Sampling-like Algorithms for Stochastic Rising Bandits0
Leveraging Offline Data from Similar Systems for Online Linear Quadratic Control0
Show:102550
← PrevPage 2 of 66Next →

No leaderboard results yet.