SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 101150 of 1262 papers

TitleStatusHype
Doubly-Robust Lasso BanditCode0
Human in the Loop Adaptive Optimization for Improved Time Series ForecastingCode0
A Convex Framework for Confounding Robust InferenceCode0
Doubly Robust Policy Evaluation and LearningCode0
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox OptimizationCode0
Infinite Action Contextual Bandits with Reusable Data ExhaustCode0
Introduction to Multi-Armed BanditsCode0
Invariant Policy Learning: A Causal PerspectiveCode0
Addressing the Long-term Impact of ML Decisions via Policy RegretCode0
Antithetic Sampling for Top-k Shapley IdentificationCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Decentralized Cooperative Stochastic BanditsCode0
Latent Bottlenecked Attentive Neural ProcessesCode0
Learning Contextual Bandits in a Non-stationary EnvironmentCode0
Approximating a Target Distribution using Weight QueriesCode0
Let's Get It Started: Fostering the Discoverability of New Releases on DeezerCode0
Adversarial Attacks on Combinatorial Multi-Armed BanditsCode0
Locally Differentially Private (Contextual) Bandits LearningCode0
AC-Band: A Combinatorial Bandit-Based Approach to Algorithm ConfigurationCode0
MABSplit: Faster Forest Training Using Multi-Armed BanditsCode0
Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity ConstraintsCode0
Medoids in almost linear time via multi-armed banditsCode0
A Survey of Online Experiment Design with the Stochastic Multi-Armed BanditCode0
Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading BanditsCode0
A Survey on Contextual Multi-armed BanditsCode0
Machine Teaching of Active Sequential LearnersCode0
A New Bandit Setting Balancing Information from State Evolution and Corrupted ContextCode0
Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual BanditsCode0
Dual-Mandate Patrols: Multi-Armed Bandits for Green SecurityCode0
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesCode0
Contextual bandits with entropy-based human feedbackCode0
Contextual Bandits with Stochastic ExpertsCode0
Conditionally Risk-Averse Contextual BanditsCode0
Adaptive Action Duration with Contextual Bandits for Deep Reinforcement Learning in Dynamic EnvironmentsCode0
Confidence Intervals for Policy Evaluation in Adaptive ExperimentsCode0
Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex NetworksCode0
A Field Test of Bandit Algorithms for Recommendations: Understanding the Validity of Assumptions on Human Preferences in Multi-armed BanditsCode0
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance WeightingCode0
Constrained regret minimization for multi-criterion multi-armed banditsCode0
Balanced off-policy evaluation in general action spacesCode0
Contextual Bandits with Large Action Spaces: Made PracticalCode0
Contextual Linear Bandits under Noisy Features: Towards Bayesian OraclesCode0
Censored Semi-Bandits: A Framework for Resource Allocation with Censored FeedbackCode0
Correlated Multi-armed Bandits with a Latent Random SourceCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Causally Abstracted Multi-armed BanditsCode0
Combinatorial Bandits under Strategic ManipulationsCode0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Causal Contextual Bandits with Adaptive ContextCode0
Show:102550
← PrevPage 3 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified