SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10511075 of 1262 papers

TitleStatusHype
ADARES: Adaptive Resource Management for Virtual Machines0
A Bandit Approach to Sequential Experimental Design with False Discovery Control0
Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward0
Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task0
Stochastic Top-K Subset Bandits with Linear Space and Non-Linear Feedback0
Adversarial Bandits with Knapsacks0
Kernel-based Multi-Task Contextual Bandits in Cellular Network Configuration0
Rotting bandits are not harder than stochastic ones0
Bandits with Temporal Stochastic Constraints0
Best Arm Identification in Linked Bandits0
Decentralized Exploration in Multi-Armed Bandits -- Extended version0
Sample complexity of partition identification using multi-armed bandits0
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
Multi-armed Bandits with Compensation0
Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions ModelingCode1
Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With RenegingCode0
Online learning with feedback graphs and switching costs0
Simple Regret Minimization for Contextual Bandits0
Fighting Contextual Bandits with Stochastic Smoothing0
Regularized Contextual Bandits0
Decentralized Cooperative Stochastic BanditsCode0
Contextual Multi-Armed Bandits for Causal Marketing0
Thompson Sampling Algorithms for Cascading Bandits0
Show:102550
← PrevPage 43 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified