SOTAVerified

Thompson Sampling

Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists of choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Papers

Showing 5175 of 655 papers

TitleStatusHype
Optimizing Posterior Samples for Bayesian Optimization via RootfindingCode1
Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program0
BanditCAT and AutoIRT: Machine Learning Approaches to Computerized Adaptive Testing and Item Calibration0
Robust Thompson Sampling Algorithms Against Reward Poisoning Attacks0
Distributed Thompson sampling under constrained communicationCode0
Aligning AI Agents via Information-Directed Sampling0
Queueing Matching Bandits with Preference FeedbackCode0
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing0
Gaussian Process Thompson Sampling via Rootfinding0
Batched Bayesian optimization by maximizing the probability of including the optimumCode1
Contextual Bandits with Non-Stationary Correlated Rewards for User Association in MmWave Vehicular Networks0
Thompson Sampling For Combinatorial Bandits: Polynomial Regret and Mismatched Sampling ParadoxCode0
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling0
Improving Portfolio Optimization Results with Bandit NetworksCode0
Partially Observable Contextual Bandits with Linear Payoffs0
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis0
Sliding-Window Thompson Sampling for Non-Stationary Settings0
Multi-Task Combinatorial Bandits for Budget Allocation0
An Extremely Data-efficient and Generative LLM-based Reinforcement Learning Agent for Recommenders0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
Constructing Adversarial Examples for Vertical Federated Learning: Optimal Client Corruption through Multi-Armed BanditCode0
Optimization-Driven Adaptive Experimentation0
Anytime Multi-Agent Path Finding with an Adaptive Delay-Based HeuristicCode0
Process-constrained batch Bayesian approaches for yield optimization in multi-reactor systemsCode0
Show:102550
← PrevPage 3 of 27Next →

No leaderboard results yet.