SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 351400 of 1262 papers

TitleStatusHype
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits0
Distributed Thompson Sampling0
An Empirical Evaluation of Thompson Sampling0
Best Arm Identification under Additive Transfer Bandits0
Multi-player Multi-armed Bandits for Stable Allocation in Heterogeneous Ad-Hoc Networks0
Best Arm Identification in Stochastic Bandits: Beyond β-optimality0
An Empirical Evaluation of Federated Contextual Bandit Algorithms0
Best Arm Identification in Restless Markov Multi-Armed Bandits0
Distributed Cooperative Decision Making in Multi-agent Multi-armed Bandits0
Best arm identification in multi-armed bandits with delayed feedback0
Best Arm Identification in Linked Bandits0
Discrete Choice Multi-Armed Bandits0
Best-Arm Identification in Correlated Multi-Armed Bandits0
An Efficient Algorithm for Deep Stochastic Contextual Bandits0
Adaptive Learning Rate for Follow-the-Regularized-Leader: Competitive Analysis and Best-of-Both-Worlds0
Active Reinforcement Learning: Observing Rewards at a Cost0
Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits0
Diffusion Models Meet Contextual Bandits with Large Action Spaces0
Disentangling Exploration from Exploitation0
Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication0
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme0
Distributed Differential Privacy in Multi-Armed Bandits0
Distributed Exploration in Multi-Armed Bandits0
Diffusion Approximations for Thompson Sampling0
Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?0
Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints0
Distributed Online Learning via Cooperative Contextual Bandits0
Distributed Optimization via Kernelized Multi-armed Bandits0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives0
Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits0
Distributionally Robust Batch Contextual Bandits0
Differentially Private Multi-Armed Bandits in the Shuffle Model0
Distribution-Dependent Rates for Multi-Distribution Learning0
Diversify and Conquer: Bandits and Diversity for an Enhanced E-commerce Homepage Experience0
Diversity-Based Recruitment in Crowdsensing By Combinatorial Multi-Armed Bandits0
Differentially Private Kernelized Contextual Bandits0
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
Online Multi-Armed Bandits with Adaptive Inference0
Be Greedy in Multi-Armed Bandits0
Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards0
Meta-Learning Bandit Policies by Gradient Ascent0
Doubly robust off-policy evaluation with shrinkage0
Beam Learning -- Using Machine Learning for Finding Beam Directions0
Doubly Robust Policy Evaluation and Optimization0
A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits0
Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions0
Designing an Interpretable Interface for Contextual Bandits0
BEACON: Balancing Convenience and Nutrition in Meals With Long-Term Group Recommendations and Reasoning on Multimodal Recipes0
Show:102550
← PrevPage 8 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified