SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 951975 of 1262 papers

TitleStatusHype
The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits0
Are sample means in multi-armed bandits positively or negatively biased?0
Cramming Contextual Bandits for On-policy Statistical Evaluation0
The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information0
The Externalities of Exploration and How Data Diversity Helps Exploitation0
The K-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates0
The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication0
The Pareto Frontier of model selection for general Contextual Bandits0
The Price of Differential Privacy For Online Learning0
Thompson Sampling for Budgeted Multi-armed Bandits0
Thompson Sampling Algorithms for Cascading Bandits0
Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints0
Thompson sampling for improved exploration in GFlowNets0
Thompson Sampling for Unsupervised Sequential Selection0
Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study0
Thompson Sampling in Partially Observable Contextual Bandits0
Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards0
Thresholding Data Shapley for Data Cleansing Using Multi-Armed Bandits0
Tight Gap-Dependent Memory-Regret Trade-Off for Single-Pass Streaming Stochastic Multi-Armed Bandits0
Tight Lower Bounds for Combinatorial Multi-Armed Bandits0
Tight Regret Bounds for Infinite-armed Linear Contextual Bandits0
Top-K Ranking Deep Contextual Bandits for Information Selection Systems0
To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation0
Towards Distribution-Free Multi-Armed Bandits with Combinatorial Strategies0
Towards Domain Adaptive Neural Contextual Bandits0
Show:102550
← PrevPage 39 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified