SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 376400 of 1262 papers

TitleStatusHype
Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints0
Distributed Online Learning via Cooperative Contextual Bandits0
Distributed Optimization via Kernelized Multi-armed Bandits0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives0
Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits0
Distributionally Robust Batch Contextual Bandits0
Differentially Private Multi-Armed Bandits in the Shuffle Model0
Distribution-Dependent Rates for Multi-Distribution Learning0
Diversify and Conquer: Bandits and Diversity for an Enhanced E-commerce Homepage Experience0
Diversity-Based Recruitment in Crowdsensing By Combinatorial Multi-Armed Bandits0
Differentially Private Kernelized Contextual Bandits0
DOPL: Direct Online Preference Learning for Restless Bandits with Preference Feedback0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
Online Multi-Armed Bandits with Adaptive Inference0
Be Greedy in Multi-Armed Bandits0
Differentially Private Episodic Reinforcement Learning with Heavy-tailed Rewards0
Meta-Learning Bandit Policies by Gradient Ascent0
Doubly robust off-policy evaluation with shrinkage0
Beam Learning -- Using Machine Learning for Finding Beam Directions0
Doubly Robust Policy Evaluation and Optimization0
A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits0
Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions0
Designing an Interpretable Interface for Contextual Bandits0
BEACON: Balancing Convenience and Nutrition in Meals With Long-Term Group Recommendations and Reasoning on Multimodal Recipes0
Show:102550
← PrevPage 16 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified