SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 276300 of 1262 papers

TitleStatusHype
Batched Nonparametric Contextual Bandits0
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace RecoveryCode0
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement0
Optimistic Information Directed Sampling0
Multi-Armed Bandits with Abstention0
A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health0
Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits0
Incentivized Exploration via Filtered Posterior Sampling0
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
Thompson Sampling in Partially Observable Contextual Bandits0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
FLASH: Federated Learning Across Simultaneous Heterogeneities0
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits0
Replicability is Asymptotically Free in Multi-armed Bandits0
Contextual Multinomial Logit Bandits with General Value Functions0
Efficient Contextual Bandits with Uninformed Feedback Graphs0
Stochastic contextual bandits with graph feedback: from independence number to MAS number0
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning0
Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise0
Tree Ensembles for Contextual Bandits0
Fairness of Exposure in Online Restless Multi-armed BanditsCode0
Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic BanditsCode0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
Fairness and Privacy Guarantees in Federated Contextual Bandits0
Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionCode0
Show:102550
← PrevPage 12 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified