SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 251300 of 1262 papers

TitleStatusHype
Competing Bandits: The Perils of Exploration Under Competition0
A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach0
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs0
Concurrent Decentralized Channel Allocation and Access Point Selection using Multi-Armed Bandits in multi BSS WLANs0
Balanced Linear Contextual Bandits0
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits0
Contextual Multi-Armed Bandits for Causal Marketing0
Autonomous Drug Design with Multi-Armed Bandits0
AutoML for Contextual Bandits0
A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback0
Automatic Ensemble Learning for Online Influence Maximization0
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning0
A Contextual Combinatorial Bandit Approach to Negotiation0
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond0
Contextual Multinomial Logit Bandits with General Value Functions0
Contextual Bandits Evolving Over Finite Time0
Contextual Bandits and Optimistically Universal Learning0
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis0
Contextual Bandits and Imitation Learning via Preference-Based Active Queries0
Contextual Bandit Applications in Customer Support Bot0
Contextual Bandits for adapting to changing User preferences over time0
Contextual Bandits for Advertising Budget Allocation0
Contextual Bandits for Advertising Campaigns: A Diffusion-Model Independent Approach (Extended Version)0
Contextual Bandits for Evaluating and Improving Inventory Control Policies0
Contextual Bandits for Unbounded Context Distributions0
Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning0
Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning at Adyen0
Linear Bandits with Stochastic Delayed Feedback0
Contextual Bandits with Arm Request Costs and Delays0
Contextual Bandits with Budgeted Information Reveal0
Contextual bandits with concave rewards, and an application to fair ranking0
Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting0
Contextual Bandits with Cross-learning0
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards0
Asymptotic Randomised Control with applications to bandits0
Contextual Bandits with Knapsacks for a Conversion Model0
Contextual Bandits with Latent Confounders: An NMF Approach0
Contextual Bandits with Non-Stationary Correlated Rewards for User Association in MmWave Vehicular Networks0
Contextual Bandits with Online Neural Regression0
Contextual Bandits with Random Projection0
Contextual Bandits with Side-Observations0
Contextual Bandits with Similarity Information0
BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System0
Contextual Bandits with Sparse Data in Web setting0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
Contextual Bandits with Stage-wise Constraints0
Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms0
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
Contextual Causal Bayesian Optimisation0
Context-Aware Bandits0
Show:102550
← PrevPage 6 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified