SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 851900 of 1262 papers

TitleStatusHype
Multi-Armed Bandits for Minesweeper: Profiting from Exploration-Exploitation Synergy0
Competing Bandits: The Perils of Exploration Under Competition0
Minimax Policy for Heavy-tailed Bandits0
Self-Tuning Bandits over Unknown Covariate-Shifts0
Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits0
Quantum exploration algorithms for multi-armed banditsCode0
Optimal Learning for Structured BanditsCode0
Fair Algorithms for Multi-Agent Multi-Armed Bandits0
Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual BanditsCode0
Robust Multi-Agent Multi-Armed Bandits0
Multi-Armed Bandits with Local Differential Privacy0
Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design0
Continuous-Time Multi-Armed Bandits with Controlled Restarts0
Offline Contextual Bandits with Overparameterized ModelsCode0
Online learning with Corrupted context: Corrupted Contextual Bandits0
Approximating a Target Distribution using Weight QueriesCode0
Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning0
Towards Tractable Optimism in Model-Based Reinforcement Learning0
Open Problem: Model Selection for Contextual Bandits0
Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect0
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance WeightingCode0
Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach0
Stochastic Bandits with Linear Constraints0
Constrained regret minimization for multi-criterion multi-armed banditsCode0
Finding All ε-Good Arms in Stochastic BanditsCode0
Non-Stationary Off-Policy Optimization0
Explicit Best Arm Identification in Linear Bandits Using No-Regret Learners0
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme0
TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation0
Bandits with Partially Observable Confounded Data0
Gaussian Gated Linear NetworksCode0
Distributionally Robust Batch Contextual Bandits0
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition0
Meta-Learning Bandit Policies by Gradient Ascent0
Online Learning in Iterated Prisoner's Dilemma to Mimic Human BehaviorCode0
Contextual Bandits with Side-Observations0
Concurrent Decentralized Channel Allocation and Access Point Selection using Multi-Armed Bandits in multi BSS WLANs0
Locally Differentially Private (Contextual) Bandits LearningCode0
(Locally) Differentially Private Combinatorial Semi-Bandits0
To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation0
Greedy Algorithm almost Dominates in Smoothed Contextual Bandits0
Neural Network Retraining for Model Serving0
Learning to Rank in the Position Based Model with Bandit Feedback0
Thompson Sampling for Linearly Constrained BanditsCode0
Sequential Batch Learning in Finite-Action Linear Contextual Bandits0
Power Constrained BanditsCode0
Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-Armed Bandits0
Hawkes Process Multi-armed Bandits for Disaster Search and Rescue0
Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability0
Optimal No-regret Learning in Repeated First-price Auctions0
Show:102550
← PrevPage 18 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified