SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10511100 of 1262 papers

TitleStatusHype
Balanced Linear Contextual Bandits0
ADARES: Adaptive Resource Management for Virtual Machines0
Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward0
Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task0
A Bandit Approach to Sequential Experimental Design with False Discovery Control0
Stochastic Top-K Subset Bandits with Linear Space and Non-Linear Feedback0
Adversarial Bandits with Knapsacks0
Kernel-based Multi-Task Contextual Bandits in Cellular Network Configuration0
Rotting bandits are not harder than stochastic ones0
Bandits with Temporal Stochastic Constraints0
Decentralized Exploration in Multi-Armed Bandits -- Extended version0
Best Arm Identification in Linked Bandits0
Sample complexity of partition identification using multi-armed bandits0
Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
Multi-armed Bandits with Compensation0
Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With RenegingCode0
Online learning with feedback graphs and switching costs0
Simple Regret Minimization for Contextual Bandits0
Regularized Contextual Bandits0
Fighting Contextual Bandits with Stochastic Smoothing0
Decentralized Cooperative Stochastic BanditsCode0
Thompson Sampling Algorithms for Cascading Bandits0
Contextual Multi-Armed Bandits for Causal Marketing0
Contextual Bandits with Cross-learning0
SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed BanditsCode0
Multi-Player Bandits: A Trekking Approach0
Machine Teaching of Active Sequential LearnersCode0
Diversity-Driven Selection of Exploration Strategies in Multi-Armed Bandits0
Correlated Multi-armed Bandits with a Latent Random SourceCode0
Data Poisoning Attacks in Contextual Bandits0
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
On-line Adaptative Curriculum Learning for GANsCode0
Preference-based Online Learning with Dueling Bandits: A Survey0
Deep Contextual Multi-armed Bandits0
Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits0
Linear Bandits with Stochastic Delayed Feedback0
Multi-User Multi-Armed Bandits for Uncoordinated Spectrum Access0
Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems0
Contextual bandits with surrogate losses: Margin bounds and efficient algorithms0
Greybox fuzzing as a contextual bandits problem0
Finding the bandit in a graph: Sequential search-and-stop0
Mitigating Bias in Adaptive Data Gathering via Differential Privacy0
A General Framework for Bandit Problems Beyond Cumulative Objectives0
The Externalities of Exploration and How Data Diversity Helps Exploitation0
Myopic Bayesian Design of Experiments via Posterior Sampling and Probabilistic ProgrammingCode0
Learning Contextual Bandits in a Non-stationary EnvironmentCode0
Multi-Statistic Approximate Bayesian Computation with Multi-Armed Bandits0
Bandit-Based Monte Carlo Optimization for Nearest NeighborsCode0
Show:102550
← PrevPage 22 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified