SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 551600 of 1262 papers

TitleStatusHype
Information-Directed Selection for Top-Two AlgorithmsCode0
Neural Contextual Bandits Based Dynamic Sensor Selection for Low-Power Body-Area Networks0
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs0
Falsification of Multiple Requirements for Cyber-Physical Systems Using Online Generative Adversarial Networks and Multi-Armed Bandits0
Contextual Information-Directed Sampling0
Pessimism for Offline Linear Contextual Bandits using _p Confidence Sets0
SplitPlace: AI Augmented Splitting and Placement of Large-Scale Neural Networks in Mobile Edge EnvironmentsCode1
Stability Enforced Bandit Algorithms for Channel Selection in Remote State Estimation of Gauss-Markov Processes0
Breaking the T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits0
Multi-Armed Bandits in Brain-Computer InterfacesCode0
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs0
Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization0
From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses0
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions0
A Survey of Risk-Aware Multi-Armed Bandits0
Selectively Contextual Bandits0
Federated Multi-Armed Bandits Under Byzantine Attacks0
Pervasive Machine Learning for Smart Radio Environments Enabled by Reconfigurable Intelligent SurfacesCode1
Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications0
Evolutionary Multi-Armed Bandits with Genetic Thompson SamplingCode0
Thompson Sampling for Bandit Learning in Matching MarketsCode0
Rate-Constrained Remote Contextual Bandits0
Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations0
Stochastic Multi-armed Bandits with Non-stationary Rewards Generated by a Linear Dynamical System0
Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk0
Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracles0
Best Arm Identification in Restless Markov Multi-Armed Bandits0
On Kernelized Multi-Armed Bandits with Constraints0
Modeling Attrition in Recommender Systems with Departing Bandits0
Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic maskingCode0
Efficient Algorithms for Extreme BanditsCode0
Approximate Function Evaluation via Multi-Armed Bandits0
Reinforced Meta Active Learning0
Reward-Biased Maximum Likelihood Estimation for Neural Contextual Bandits0
PAC-Bayesian Lifelong Learning For Multi-Armed Bandits0
Restless Multi-Armed Bandits under Exogenous Global Markov Process0
Federated Online Sparse Decision Making0
Truncated LinUCB for Stochastic Linear BanditsCode0
The Pareto Frontier of Instance-Dependent Guarantees in Multi-Player Multi-Armed Bandits with no Communication0
Cost-Efficient Distributed Learning via Combinatorial Multi-Armed Bandits0
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences0
Off-Policy Evaluation for Large Action Spaces via EmbeddingsCode2
Shuffle Private Linear Contextual Bandits0
Efficient Kernel UCB for Contextual BanditsCode0
Remote Contextual Bandits0
Settling the Communication Complexity for Distributed Offline Reinforcement Learning0
Smoothed Online Learning is as Easy as Statistical Learning0
Budgeted Combinatorial Multi-Armed Bandits0
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits0
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior ModelCode2
Show:102550
← PrevPage 12 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified