SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10011025 of 1262 papers

TitleStatusHype
Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits0
Value Directed Exploration in Multi-Armed Bandits with Structured Priors0
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency0
Variance-Dependent Regret Lower Bounds for Contextual Bandits0
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits0
Variational Inference for Model-Free and Model-Based Reinforcement Learning0
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences0
Vertical Federated Linear Contextual Bandits0
Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits0
Bandit algorithms to emulate human decision making using probabilistic distortions0
What Doubling Tricks Can and Can't Do for Multi-Armed Bandits0
Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs0
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits0
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes0
Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task0
Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations0
You Can Trade Your Experience in Distributed Multi-Agent Multi-Armed Bandits0
A Survey on Practical Applications of Multi-Armed and Contextual Bandits0
Zero-Inflated Bandits0
Functional multi-armed bandit and the best function identification problems0
A Bandit Approach to Sequential Experimental Design with False Discovery Control0
A Batch Sequential Halving Algorithm without Performance Degradation0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond0
Access Probability Optimization in RACH: A Multi-Armed Bandits Approach0
Show:102550
← PrevPage 41 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified