SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 126150 of 1262 papers

TitleStatusHype
Machine Teaching of Active Sequential LearnersCode0
A New Bandit Setting Balancing Information from State Evolution and Corrupted ContextCode0
Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual BanditsCode0
Dual-Mandate Patrols: Multi-Armed Bandits for Green SecurityCode0
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesCode0
Contextual bandits with entropy-based human feedbackCode0
Contextual Bandits with Stochastic ExpertsCode0
Conditionally Risk-Averse Contextual BanditsCode0
Adaptive Action Duration with Contextual Bandits for Deep Reinforcement Learning in Dynamic EnvironmentsCode0
Confidence Intervals for Policy Evaluation in Adaptive ExperimentsCode0
Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex NetworksCode0
A Field Test of Bandit Algorithms for Recommendations: Understanding the Validity of Assumptions on Human Preferences in Multi-armed BanditsCode0
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance WeightingCode0
Constrained regret minimization for multi-criterion multi-armed banditsCode0
Balanced off-policy evaluation in general action spacesCode0
Contextual Bandits with Large Action Spaces: Made PracticalCode0
Contextual Linear Bandits under Noisy Features: Towards Bayesian OraclesCode0
Censored Semi-Bandits: A Framework for Resource Allocation with Censored FeedbackCode0
Correlated Multi-armed Bandits with a Latent Random SourceCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Causally Abstracted Multi-armed BanditsCode0
Combinatorial Bandits under Strategic ManipulationsCode0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Causal Contextual Bandits with Adaptive ContextCode0
Show:102550
← PrevPage 6 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified