SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 251275 of 1262 papers

TitleStatusHype
Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits0
Incentivized Exploration via Filtered Posterior Sampling0
Thompson Sampling in Partially Observable Contextual Bandits0
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
FLASH: Federated Learning Across Simultaneous Heterogeneities0
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits0
Stochastic contextual bandits with graph feedback: from independence number to MAS number0
Efficient Contextual Bandits with Uninformed Feedback Graphs0
Contextual Multinomial Logit Bandits with General Value Functions0
Replicability is Asymptotically Free in Multi-armed Bandits0
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning0
Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise0
Tree Ensembles for Contextual Bandits0
Fairness of Exposure in Online Restless Multi-armed BanditsCode0
Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic BanditsCode0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
Fairness and Privacy Guarantees in Federated Contextual Bandits0
Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionCode0
Multi-Armed Bandits with Interference0
Query-Efficient Correlation Clustering with Noisy Oracle0
Falcon: Fair Active Learning using Multi-armed BanditsCode0
Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints0
Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual BanditsCode0
Adaptive Regret for Bandits Made Possible: Two Queries Suffice0
Show:102550
← PrevPage 11 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified