SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 251300 of 1262 papers

TitleStatusHype
Leveraging (Biased) Information: Multi-armed Bandits with Offline Data0
Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery0
Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback0
Recommenadation aided Caching using Combinatorial Multi-armed Bandits0
Disentangling Exploration from Exploitation0
Causally Abstracted Multi-armed BanditsCode0
Structured Reinforcement Learning for Delay-Optimal Data Transmission in Dense mmWave Networks0
Sequential Decision Making with Expert Demonstrations under Unobserved HeterogeneityCode0
Generalized Linear Bandits with Limited AdaptivityCode0
Feel-Good Thompson Sampling for Contextual Dueling Bandits0
On the Importance of Uncertainty in Decision-Making with Large Language Models0
Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy0
Nearly-tight Approximation Guarantees for the Improving Multi-Armed Bandits Problem0
A Correction of Pseudo Log-Likelihood Method0
Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making0
Transfer in Sequential Multi-armed Bandits via Reward Samples0
Phasic Diversity Optimization for Population-Based Reinforcement Learning0
Cramming Contextual Bandits for On-policy Statistical Evaluation0
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment0
Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning0
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits0
A General Reduction for High-Probability Analysis with General Light-Tailed Distributions0
Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds0
Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain0
Federated Linear Contextual Bandits with Heterogeneous Clients0
Batched Nonparametric Contextual Bandits0
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace RecoveryCode0
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement0
Optimistic Information Directed Sampling0
Multi-Armed Bandits with Abstention0
A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health0
Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits0
Incentivized Exploration via Filtered Posterior Sampling0
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
Thompson Sampling in Partially Observable Contextual Bandits0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
FLASH: Federated Learning Across Simultaneous Heterogeneities0
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits0
Replicability is Asymptotically Free in Multi-armed Bandits0
Contextual Multinomial Logit Bandits with General Value Functions0
Efficient Contextual Bandits with Uninformed Feedback Graphs0
Stochastic contextual bandits with graph feedback: from independence number to MAS number0
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning0
Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise0
Tree Ensembles for Contextual Bandits0
Fairness of Exposure in Online Restless Multi-armed BanditsCode0
Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic BanditsCode0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
Fairness and Privacy Guarantees in Federated Contextual Bandits0
Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionCode0
Show:102550
← PrevPage 6 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified