SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10511100 of 1262 papers

TitleStatusHype
Adaptive Endpointing with Deep Contextual Multi-armed Bandits0
Adaptive Exploration in Linear Contextual Bandit0
Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds0
Adaptively Learning to Select-Rank in Online Platforms0
Adaptive Regret for Bandits Made Possible: Two Queries Suffice0
Adaptive, Robust and Scalable Bayesian Filtering for Online Learning0
ADARES: Adaptive Resource Management for Virtual Machines0
A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health0
Bandits with Knapsacks beyond the Worst-Case0
Adversarial Attacks on Adversarial Bandits0
Adversarial Attacks on Cooperative Multi-agent Bandits0
Adversarial Attacks on Linear Contextual Bandits0
Adversarial Bandits with Knapsacks0
Adversarial Contextual Bandits Go Kernelized0
Adversarial Linear Contextual Bandits with Graph-Structured Side Observations0
α-Fair Contextual Bandits0
A Farewell to Arms: Sequential Reward Maximization on a Budget with a Giving Up Option0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback0
A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach0
A Gang of Bandits0
A General Framework for Bandit Problems Beyond Cumulative Objectives0
A General Framework for Off-Policy Learning with Partially-Observed Reward0
A General Theory of the Stochastic Linear Bandit and Its Applications0
A Hierarchical Nearest Neighbour Approach to Contextual Bandits0
A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization0
A KL-LUCB algorithm for Large-Scale Crowdsourcing0
Algorithms for Differentially Private Multi-Armed Bandits0
Algorithms for multi-armed bandit problems0
Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits0
Almost Boltzmann Exploration0
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits0
A Model Selection Approach for Corruption Robust Reinforcement Learning0
An Adaptive Method for Contextual Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System0
Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits0
An Analysis of Reinforcement Learning for Malaria Control0
An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits0
A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits0
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives0
An Efficient Algorithm for Deep Stochastic Contextual Bandits0
An Empirical Evaluation of Federated Contextual Bandit Algorithms0
An Empirical Evaluation of Thompson Sampling0
A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free0
A New Benchmark for Online Learning with Budget-Balancing Constraints0
An Exploration-free Method for a Linear Stochastic Bandit Driven by a Linear Gaussian Dynamical System0
An Improved Relaxation for Oracle-Efficient Adversarial Contextual Bandits0
An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit0
An Instrumental Value for Data Production and its Application to Data Pricing0
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays0
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits0
Show:102550
← PrevPage 22 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified