SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 201250 of 1262 papers

TitleStatusHype
NeuroSep-CP-LCB: A Deep Learning-based Contextual Multi-armed Bandit Algorithm with Uncertainty Quantification for Early Sepsis PredictionCode0
Nonparametric Gaussian Mixture Models for the Multi-Armed BanditCode0
Constrained regret minimization for multi-criterion multi-armed banditsCode0
Off-Policy Evaluation of Slate Bandit Policies via Optimizing AbstractionCode0
Model selection for contextual banditsCode0
On-line Adaptative Curriculum Learning for GANsCode0
On Private Online Convex Optimization: Optimal Algorithms in _p-Geometry and High Dimensional Contextual BanditsCode0
Budgeted Multi-Armed Bandits with Asymmetric Confidence IntervalsCode0
The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many ArmsCode0
Optimal Baseline Corrections for Off-Policy Contextual BanditsCode0
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox OptimizationCode0
Information-Directed Selection for Top-Two AlgorithmsCode0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Conditionally Risk-Averse Contextual BanditsCode0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Offline Contextual Bandits with Overparameterized ModelsCode0
Contextual bandits with entropy-based human feedbackCode0
Causal Contextual Bandits with Adaptive ContextCode0
Addressing the Long-term Impact of ML Decisions via Policy RegretCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Causally Abstracted Multi-armed BanditsCode0
Censored Semi-Bandits: A Framework for Resource Allocation with Censored FeedbackCode0
Doubly Robust Policy Evaluation and OptimizationCode0
Maximizing and Satisficing in Multi-armed Bandits with Graph InformationCode0
Quantile Bandits for Best Arms IdentificationCode0
Quantum exploration algorithms for multi-armed banditsCode0
Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual BanditsCode0
Regret Bounds for Thompson Sampling in Episodic Restless Bandit ProblemsCode0
Reinforcement Learning for Physical Layer CommunicationsCode0
Relational Boosted BanditsCode0
Group Meritocratic Fairness in Linear Contextual BanditsCode0
Combinatorial Bandits under Strategic ManipulationsCode0
Semiparametric Contextual BanditsCode0
Sequential Decision Making with Expert Demonstrations under Unobserved HeterogeneityCode0
Adversarial Attacks on Combinatorial Multi-Armed BanditsCode0
Combinatorial Multi-armed Bandits for Resource AllocationCode0
Simultaneously Achieving Group Exposure Fairness and Within-Group Meritocracy in Stochastic BanditsCode0
Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret RegimesCode0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and LearningCode0
Test-Time Scaling of Diffusion Models via Noise Trajectory SearchCode0
The Assistive Multi-Armed BanditCode0
Thompson Sampling for Contextual Bandits with Linear PayoffsCode0
Thompson Sampling for High-Dimensional Sparse Linear Contextual BanditsCode0
Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous AgentsCode0
Thompson Sampling for Multinomial Logit Contextual BanditsCode0
Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex NetworksCode0
A Survey of Online Experiment Design with the Stochastic Multi-Armed BanditCode0
Multi-Armed Bandits with Correlated ArmsCode0
Variational inference for the multi-armed contextual banditCode0
Show:102550
← PrevPage 5 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified