SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 251300 of 1262 papers

TitleStatusHype
Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits0
Incentivized Exploration via Filtered Posterior Sampling0
Thompson Sampling in Partially Observable Contextual Bandits0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
FLASH: Federated Learning Across Simultaneous Heterogeneities0
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits0
Stochastic contextual bandits with graph feedback: from independence number to MAS number0
Contextual Multinomial Logit Bandits with General Value Functions0
Efficient Contextual Bandits with Uninformed Feedback Graphs0
Replicability is Asymptotically Free in Multi-armed Bandits0
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning0
Fast UCB-type algorithms for stochastic bandits with heavy and super heavy symmetric noise0
Tree Ensembles for Contextual Bandits0
Fairness of Exposure in Online Restless Multi-armed BanditsCode0
Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic BanditsCode0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
Fairness and Privacy Guarantees in Federated Contextual Bandits0
Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionCode0
Multi-Armed Bandits with Interference0
Query-Efficient Correlation Clustering with Noisy Oracle0
Falcon: Fair Active Learning using Multi-armed BanditsCode0
Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual BanditsCode0
Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints0
Adaptive Regret for Bandits Made Possible: Two Queries Suffice0
On Quantum Natural Policy Gradients0
Contextual Bandits with Stage-wise Constraints0
Let's Get It Started: Fostering the Discoverability of New Releases on DeezerCode0
Reliability-Optimized User Admission Control for URLLC Traffic: A Neural Contextual Bandit Approach0
Optimal cross-learning for contextual bandits with unknown context distributions0
Foundations of Reinforcement Learning and Interactive Decision Making0
Best-of-Both-Worlds Linear Contextual Bandits0
Harnessing the Power of Federated Learning in Federated Contextual BanditsCode0
Diversity-Based Recruitment in Crowdsensing By Combinatorial Multi-Armed Bandits0
Zero-Inflated Bandits0
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits0
Neural Contextual Bandits for Personalized Recommendation0
In-Context Reinforcement Learning for Variable Action SpacesCode1
Bayesian Analysis of Combinatorial Gaussian Process Bandits0
Distribution-Dependent Rates for Multi-Distribution Learning0
Observation-Augmented Contextual Multi-Armed Bandits for Robotic Search and Exploration0
Best Arm Identification with Fixed Budget: A Large Deviation PerspectiveCode0
Online Restless Multi-Armed Bandits with Long-Term Fairness Constraints0
Risk-Aware Continuous Control with Neural Contextual BanditsCode0
A Hierarchical Nearest Neighbour Approach to Contextual Bandits0
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents0
Contextual Bandits with Online Neural Regression0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Distributed Optimization via Kernelized Multi-armed Bandits0
Marginal Density Ratio for Off-Policy Evaluation in Contextual BanditsCode0
Show:102550
← PrevPage 6 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified