SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 9761000 of 1262 papers

TitleStatusHype
Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards0
Optimal Activation of Halting Multi-Armed Bandit Models0
Optimal Algorithms for Range Searching over Multi-Armed Bandits0
Optimal Algorithms for Stochastic Contextual Preference Bandits0
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards0
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits0
Optimal Best-Arm Identification under Fixed Confidence with Multiple Optima0
Optimal cross-learning for contextual bandits with unknown context distributions0
Optimal Multitask Linear Regression and Contextual Bandits under Sparse Heterogeneity0
Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks0
Towards Costless Model Selection in Contextual Bandits: A Bias-Variance Perspective0
Optimal Multi-Objective Best Arm Identification with Fixed Confidence0
Optimal No-regret Learning in Repeated First-price Auctions0
Optimal Rates of (Locally) Differentially Private Heavy-tailed Multi-Armed Bandits0
Optimal Streaming Algorithms for Multi-Armed Bandits0
Optimistic Information Directed Sampling0
Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits0
Optimizing Online Advertising with Multi-Armed Bandits: Mitigating the Cold Start Problem under Auction Dynamics0
Optimizing Sharpe Ratio: Risk-Adjusted Decision-Making in Multi-Armed Bandits0
Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits0
OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits0
PAC-Bayesian Analysis of Contextual Bandits0
PAC-Bayesian Lifelong Learning For Multi-Armed Bandits0
PAC-Bayesian Offline Contextual Bandits With Guarantees0
PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits0
Show:102550
← PrevPage 40 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified