SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 151200 of 1262 papers

TitleStatusHype
Bandit Regret Scaling with the Effective Loss Range0
Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits0
Bandits Don’t Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits0
Bandits for Learning to Explain from Explanations0
Bandits meet Computer Architecture: Designing a Smartly-allocated Cache0
Bandit Social Learning: Exploration under Myopic Behavior0
Bandits Warm-up Cold Recommender Systems0
Preferences Evolve And So Should Your Bandits: Bandits with Evolving States for Online Platforms0
Bandits with Knapsacks beyond the Worst Case0
Bandits with Partially Observable Confounded Data0
Bandits with Temporal Stochastic Constraints0
Banker Online Mirror Descent0
Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning0
Batched Bandits with Crowd Externalities0
Batched Coarse Ranking in Multi-Armed Bandits0
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits0
Regret Bounds for Batched Bandits0
Batched Nonparametric Bandits via k-Nearest Neighbor UCB0
Breaking the (1/Δ_2) Barrier: Better Batched Best Arm Identification with Adaptive Grids0
Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features0
Batched Thompson Sampling0
Batched Thompson Sampling for Multi-Armed Bandits0
Batch Ensemble for Variance Dependent Regret in Stochastic Bandits0
Towards Bayesian Data Selection0
Bayesian decision-making under misspecified priors with applications to meta-learning0
An Analysis of Reinforcement Learning for Malaria Control0
An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits0
BEACON: Balancing Convenience and Nutrition in Meals With Long-Term Group Recommendations and Reasoning on Multimodal Recipes0
Beam Learning -- Using Machine Learning for Finding Beam Directions0
Be Greedy in Multi-Armed Bandits0
Efficient Prompt Optimization Through the Lens of Best Arm Identification0
Quantile Multi-Armed Bandits: Optimal Best-Arm Identification and a Differentially Private Scheme0
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards0
Best Arm Identification in Linked Bandits0
A Gang of Bandits0
Best Arm Identification in Restless Markov Multi-Armed Bandits0
Best Arm Identification in Stochastic Bandits: Beyond β-optimality0
Best Arm Identification under Additive Transfer Bandits0
An Empirical Evaluation of Thompson Sampling0
Best-of-Both-Worlds Algorithms for Linear Contextual Bandits0
Best-of-Both-Worlds Linear Contextual Bandits0
Better Algorithms for Stochastic Bandits with Adversarial Corruptions0
Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits0
Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles0
Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback0
BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits0
BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits0
Boltzmann Exploration Done Right0
A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach0
Balanced Linear Contextual Bandits0
Show:102550
← PrevPage 4 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified