SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 9511000 of 1262 papers

TitleStatusHype
The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits0
Are sample means in multi-armed bandits positively or negatively biased?0
Cramming Contextual Bandits for On-policy Statistical Evaluation0
The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information0
The Externalities of Exploration and How Data Diversity Helps Exploitation0
The K-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates0
The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication0
The Pareto Frontier of model selection for general Contextual Bandits0
The Price of Differential Privacy For Online Learning0
Thompson Sampling for Budgeted Multi-armed Bandits0
Thompson Sampling Algorithms for Cascading Bandits0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson sampling for improved exploration in GFlowNets0
Thompson Sampling for Unsupervised Sequential Selection0
Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study0
Thompson Sampling in Partially Observable Contextual Bandits0
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards0
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits0
Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits0
Tight Lower Bounds for Combinatorial Multi-Armed Bandits0
Tight Regret Bounds for Infinite-armed Linear Contextual Bandits0
Top-K Ranking Deep Contextual Bandits for Information Selection Systems0
To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation0
Towards Distribution-Free Multi-Armed Bandits with Combinatorial Strategies0
Towards Domain Adaptive Neural Contextual Bandits0
Towards More Efficient, Robust, Instance-adaptive, and Generalizable Sequential Decision making0
Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information0
Towards Robust Off-Policy Evaluation via Human Inputs0
Towards Soft Fairness in Restless Multi-Armed Bandits0
Towards Understanding the Benefit of Multitask Representation Learning in Decision Process0
Towards a Pretrained Model for Restless Bandits via Multi-arm Generalization0
Tracking Most Significant Shifts in Nonparametric Contextual Bandits0
Tractable contextual bandits beyond realizability0
Transfer in Sequential Multi-armed Bandits via Reward Samples0
Transfer Learning for Contextual Multi-armed Bandits0
Transfer Learning in Bandits with Latent Continuity0
Tree Ensembles for Contextual Bandits0
Trend Detection based Regret Minimization for Bandit Problems0
Trend-responsive User Segmentation Enabling Traceable Publishing Insights. A Case Study of a Real-world Large-scale News Recommendation System0
Triply Robust Off-Policy Evaluation0
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation0
UCB algorithms for multi-armed bandits: Precise regret and adaptive inference0
Understanding Memory-Regret Trade-Off for Streaming Stochastic Multi-Armed Bandits0
Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits0
Unifying Clustered and Non-stationary Bandits0
uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs0
Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms0
Universal and data-adaptive algorithms for model selection in linear contextual bandits0
Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems0
Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits0
Show:102550
← PrevPage 20 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified