SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 51100 of 1262 papers

TitleStatusHype
Efficient Explorative Key-term Selection Strategies for Conversational Contextual BanditsCode0
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox OptimizationCode0
Efficient Kernel UCB for Contextual BanditsCode0
Addressing the Long-term Impact of ML Decisions via Policy RegretCode0
Estimation of Warfarin Dosage with Reinforcement LearningCode0
Evaluating Deep Vs. Wide & Deep Learners As Contextual Bandits For Personalized Email Promo RecommendationsCode0
Fairness of Exposure in Online Restless Multi-armed BanditsCode0
Adversarial Attacks on Combinatorial Multi-Armed BanditsCode0
Adapting multi-armed bandits policies to contextual bandits scenariosCode0
Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewardsCode0
Federated Neural BanditsCode0
Finding All ε-Good Arms in Stochastic BanditsCode0
Empirical analysis of representation learning and exploration in neural kernel banditsCode0
Doubly robust off-policy evaluation with shrinkageCode0
Doubly Robust Policy Evaluation and LearningCode0
Gaussian Gated Linear NetworksCode0
An Empirical Evaluation of Federated Contextual Bandit AlgorithmsCode0
Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson SamplingCode0
Decentralized Cooperative Stochastic BanditsCode0
Correlated Multi-armed Bandits with a Latent Random SourceCode0
Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit ApproachCode0
Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual BanditsCode0
Contextual Bandits with Stochastic ExpertsCode0
Active Feature Selection for the Mutual Information CriterionCode0
Contextual Linear Bandits under Noisy Features: Towards Bayesian OraclesCode0
Contextual bandits with entropy-based human feedbackCode0
Confidence Intervals for Policy Evaluation in Adaptive ExperimentsCode0
Conditionally Risk-Averse Contextual BanditsCode0
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance WeightingCode0
Contextual Bandits with Large Action Spaces: Made PracticalCode0
(Almost) Free Incentivized Exploration from Decentralized Learning AgentsCode0
Constrained regret minimization for multi-criterion multi-armed banditsCode0
Adaptive Estimator Selection for Off-Policy EvaluationCode0
Corralling a Band of Bandit AlgorithmsCode0
Adaptive Experimentation with Delayed Binary FeedbackCode0
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesCode0
Doubly-Robust Lasso BanditCode0
A New Bandit Setting Balancing Information from State Evolution and Corrupted ContextCode0
Scalable Exploration via Ensemble++Code0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Combinatorial Bandits under Strategic ManipulationsCode0
Adaptive Data Depth via Multi-Armed BanditsCode0
Combinatorial Multi-armed Bandits for Resource AllocationCode0
Adaptive Linear Estimating EquationsCode0
Causally Abstracted Multi-armed BanditsCode0
A Convex Framework for Confounding Robust InferenceCode0
Bandit-Based Monte Carlo Optimization for Nearest NeighborsCode0
An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed BanditsCode0
Safe and Adaptive Decision-Making for Optimization of Safety-Critical Systems: The ARTEO AlgorithmCode0
Censored Semi-Bandits: A Framework for Resource Allocation with Censored FeedbackCode0
Show:102550
← PrevPage 2 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified