SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 251275 of 1262 papers

TitleStatusHype
Competing Bandits: The Perils of Exploration Under Competition0
Balanced Linear Contextual Bandits0
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs0
Concurrent Decentralized Channel Allocation and Access Point Selection using Multi-Armed Bandits in multi BSS WLANs0
A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach0
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits0
Autonomous Drug Design with Multi-Armed Bandits0
AutoML for Contextual Bandits0
A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback0
Automatic Ensemble Learning for Online Influence Maximization0
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning0
A Contextual Combinatorial Bandit Approach to Negotiation0
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond0
Contextual Bandits with Arm Request Costs and Delays0
Contextual Bandits Evolving Over Finite Time0
Contextual Bandits and Optimistically Universal Learning0
Augmenting Online RL with Offline Data is All You Need: A Unified Hybrid RL Algorithm Design and Analysis0
Contextual Bandits and Imitation Learning via Preference-Based Active Queries0
Contextual Bandit Applications in Customer Support Bot0
Asymptotic Randomised Control with applications to bandits0
Contextual Bandits for adapting to changing User preferences over time0
Contextual Bandits for Advertising Budget Allocation0
Contextual Bandits for Advertising Campaigns: A Diffusion-Model Independent Approach (Extended Version)0
Contextual Bandits for Evaluating and Improving Inventory Control Policies0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
Show:102550
← PrevPage 11 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified