SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 626650 of 1262 papers

TitleStatusHype
Optimal Algorithms for Stochastic Contextual Preference Bandits0
Identification of the Generalized Condorcet Winner in Multi-dueling BanditsCode0
Asymptotically Best Causal Effect Identification with Multi-Armed Bandits0
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and LearningCode0
Bandits with Knapsacks beyond the Worst Case0
Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization0
Online Fair Revenue Maximizing Cake Division with Non-Contiguous Pieces in Adversarial Bandits0
Offline Neural Contextual Bandits: Pessimism, Optimization and GeneralizationCode1
Decentralized Upper Confidence Bound Algorithms for Homogeneous Multi-Agent Multi-Armed Bandits0
Offline Contextual Bandits for Wireless Network Optimization0
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit0
Universal and data-adaptive algorithms for model selection in linear contextual bandits0
Empirical analysis of representation learning and exploration in neural kernel banditsCode0
Privacy-Preserving Communication-Efficient Federated Multi-Armed Bandits0
Bandits Don’t Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits0
Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure0
(Almost) Free Incentivized Exploration from Decentralized Learning AgentsCode0
Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and GeneralizationCode0
Federated Linear Contextual Bandits0
The Pareto Frontier of model selection for general Contextual Bandits0
Linear Contextual Bandits with Adversarial Corruptions0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
Towards the D-Optimal Online Experiment Design for Recommender SelectionCode0
Dynamic pricing and assortment under a contextual MNL demand0
Stateful Offline Contextual Policy Evaluation and Learning0
Show:102550
← PrevPage 26 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified