SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 851900 of 1262 papers

TitleStatusHype
Risk-Aware Algorithms for Adversarial Contextual Bandits0
Risk-aware linear bandits with convex loss0
Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed distributions0
Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents0
Robust Contextual Linear Bandits0
Exploiting Heterogeneity in Robust Federated Best-Arm Identification0
Robust Generalization of Quadratic Neural Networks via Function Identification0
Robust Multi-Agent Multi-Armed Bandits0
Robustness Guarantees for Mode Estimation with an Application to Bandits0
Robust Pareto Set Identification with Contaminated Bandit Feedback0
Restless and Uncertain: Robust Policies for Restless Bandits via Deep Multi-Agent Reinforcement Learning0
Robust Stochastic Linear Contextual Bandits Under Adversarial Attacks0
Rotting Bandits0
Rotting bandits are not harder than stochastic ones0
Safe Linear Leveling Bandits0
Safety-Aware Algorithms for Adversarial Contextual Bandit0
The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity0
Sample complexity of partition identification using multi-armed bandits0
Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning0
Satisficing Exploration for Deep Reinforcement Learning0
Scalable and Interpretable Contextual Bandits: A Literature Review and Retail Offer Prototype0
Scalable Discrete Sampling as a Multi-Armed Bandit Problem0
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees0
Scale Free Adversarial Multi Armed Bandits0
Scaling Multi-Armed Bandit Algorithms0
Second Order Bounds for Contextual Bandits with Function Approximation0
Selecting the best system and multi-armed bandits0
Selective Harvesting over Networks0
Selective Intervention Planning using Restless Multi-Armed Bandits to Improve Maternal and Child Health Outcomes0
Selectively Contextual Bandits0
Selective Reviews of Bandit Problems in AI via a Statistical View0
Selfish Robustness and Equilibria in Multi-Player Bandits0
Self-Supervised Contextual Bandits in Computer Vision0
Self-Tuning Bandits over Unknown Covariate-Shifts0
Semantic Parsing for Planning Goals as Constrained Combinatorial Contextual Bandits0
Semi-Parametric Batched Global Multi-Armed Bandits with Covariates0
Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization0
Sequential Batch Learning in Finite-Action Linear Contextual Bandits0
Sequential Best-Arm Identification with Application to Brain-Computer Interface0
Constrained Restless Bandits for Dynamic Scheduling in Cyber-Physical Systems0
Sequential Design for Ranking Response Surfaces0
Sequential Monte Carlo Bandits0
Settling the Communication Complexity for Distributed Offline Reinforcement Learning0
SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features0
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF0
Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms0
Shuffle Private Linear Contextual Bandits0
Simple Regret Minimization for Contextual Bandits0
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition0
Skyline Identification in Multi-Armed Bandits0
Show:102550
← PrevPage 18 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified