SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 201225 of 1262 papers

TitleStatusHype
Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex NetworksCode0
Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous AgentsCode0
Combinatorial Bandits under Strategic ManipulationsCode0
On-line Adaptative Curriculum Learning for GANsCode0
Online Limited Memory Neural-Linear Bandits with Likelihood MatchingCode0
Online Matching: A Real-time Bandit System for Large-scale RecommendationsCode0
Model selection for contextual banditsCode0
Budgeted Multi-Armed Bandits with Asymmetric Confidence IntervalsCode0
Information-Directed Selection for Top-Two AlgorithmsCode0
Optimal Learning for Structured BanditsCode0
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox OptimizationCode0
Optimistic Whittle Index Policy: Online Learning for Restless BanditsCode0
Offline Contextual Bandits with Overparameterized ModelsCode0
Combinatorial Multi-armed Bandits for Resource AllocationCode0
Conditionally Risk-Averse Contextual BanditsCode0
Persistency of Excitation for Robustness of Neural NetworksCode0
Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and SensingCode0
Causal Contextual Bandits with Adaptive ContextCode0
Addressing the Long-term Impact of ML Decisions via Policy RegretCode0
Practical Bayesian Learning of Neural Networks via Adaptive Optimisation MethodsCode0
Causally Abstracted Multi-armed BanditsCode0
Censored Semi-Bandits: A Framework for Resource Allocation with Censored FeedbackCode0
Contextual Linear Bandits under Noisy Features: Towards Bayesian OraclesCode0
Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement LearningCode0
Doubly-Robust Lasso BanditCode0
Show:102550
← PrevPage 9 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified