SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 11011150 of 1262 papers

TitleStatusHype
Multi-Armed Bandits with Correlated ArmsCode0
Jump Starting Bandits with LLM-Generated Prior KnowledgeCode0
Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret RegimesCode0
Kernel Conditional Moment Constraints for Confounding Robust InferenceCode0
Q-Learning Lagrange Policies for Multi-Action Restless BanditsCode0
Constrained regret minimization for multi-criterion multi-armed banditsCode0
Top-k eXtreme Contextual Bandits with Arm HierarchyCode0
Optimal Regret Is Achievable with Bounded Approximate Inference Error: An Enhanced Bayesian Upper Confidence Bound FrameworkCode0
Generalized Linear Bandits with Limited AdaptivityCode0
Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded RewardsCode0
On-line Adaptative Curriculum Learning for GANsCode0
Bandit-Based Monte Carlo Optimization for Nearest NeighborsCode0
Quantile Bandits for Best Arms IdentificationCode0
Adaptive Linear Estimating EquationsCode0
Latent Bottlenecked Attentive Neural ProcessesCode0
Multi-armed Bandits with Missing OutcomeCode0
Multi-Armed Bandits with Network InterferenceCode0
Residual Loss Prediction: Reinforcement Learning With No Incremental FeedbackCode0
Multi-facet Contextual Bandits: A Neural Network PerspectiveCode0
Smoothness-Adaptive Contextual BanditsCode0
Fairness of Exposure in Online Restless Multi-armed BanditsCode0
Learning Contextual Bandits in a Non-stationary EnvironmentCode0
Falcon: Fair Active Learning using Multi-armed BanditsCode0
Optimistic Whittle Index Policy: Online Learning for Restless BanditsCode0
Quantum exploration algorithms for multi-armed banditsCode0
Contextual bandits with entropy-based human feedbackCode0
Fast Beam Alignment via Pure Exploration in Multi-armed BanditsCode0
Contextual Bandits with Large Action Spaces: Made PracticalCode0
VacSIM: Learning Effective Strategies for COVID-19 Vaccine Distribution using Reinforcement LearningCode0
Transfer Learning in Latent Contextual Bandits with Covariate Shift Through Causal TransportabilityCode0
Quantum Natural Policy Gradients: Towards Sample-Efficient Reinforcement LearningCode0
Output-Weighted Sampling for Multi-Armed Bandits with Extreme PayoffsCode0
Offline Contextual Bandits with Overparameterized ModelsCode0
Solving Inverse Problem for Multi-armed Bandits via Convex OptimizationCode0
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesCode0
Learning Structural Weight Uncertainty for Sequential Decision-MakingCode0
Nonstationary Continuum-Armed Bandit Strategies for Automated Trading in a Simulated Financial MarketCode0
Contextual Bandits with Stochastic ExpertsCode0
Empirical analysis of representation learning and exploration in neural kernel banditsCode0
Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewardsCode0
Federated Multi-armed Bandits with PersonalizationCode0
Federated Neural BanditsCode0
Online Learning in Iterated Prisoner's Dilemma to Mimic Human BehaviorCode0
Thompson Sampling for Bandit Learning in Matching MarketsCode0
Variational inference for the multi-armed contextual banditCode0
AC-Band: A Combinatorial Bandit-Based Approach to Algorithm ConfigurationCode0
PageRank Bandits for Link PredictionCode0
Stochastic Rising BanditsCode0
Contextual Linear Bandits under Noisy Features: Towards Bayesian OraclesCode0
Variance-Aware Linear UCB with Deep Representation for Neural Contextual BanditsCode0
Show:102550
← PrevPage 23 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified