SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 10011050 of 1262 papers

TitleStatusHype
Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits0
Value Directed Exploration in Multi-Armed Bandits with Structured Priors0
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency0
Variance-Dependent Regret Lower Bounds for Contextual Bandits0
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits0
Variational Inference for Model-Free and Model-Based Reinforcement Learning0
Versatile Dueling Bandits: Best-of-both-World Analyses for Online Learning from Preferences0
Vertical Federated Linear Contextual Bandits0
Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits0
Bandit algorithms to emulate human decision making using probabilistic distortions0
What Doubling Tricks Can and Can't Do for Multi-Armed Bandits0
Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs0
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits0
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes0
Why so gloomy? A Bayesian explanation of human pessimism bias in the multi-armed bandit task0
Worst-case Performance of Greedy Policies in Bandits with Imperfect Context Observations0
You Can Trade Your Experience in Distributed Multi-Agent Multi-Armed Bandits0
A Survey on Practical Applications of Multi-Armed and Contextual Bandits0
Zero-Inflated Bandits0
Functional multi-armed bandit and the best function identification problems0
A Bandit Approach to Sequential Experimental Design with False Discovery Control0
A Batch Sequential Halving Algorithm without Performance Degradation0
Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits0
A Blackbox Approach to Best of Both Worlds in Bandits and Beyond0
Access Probability Optimization in RACH: A Multi-Armed Bandits Approach0
Accurate and Fast Federated Learning via Combinatorial Multi-Armed Bandits0
A Central Limit Theorem, Loss Aversion and Multi-Armed Bandits0
Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leibler Maillard Sampling0
Achieving User-Side Fairness in Contextual Bandits0
A Classification View on Meta Learning Bandits0
A Closer Look at Small-loss Bounds for Bandits with Graph Feedback0
A Contextual Combinatorial Bandit Approach to Negotiation0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
A Correction of Pseudo Log-Likelihood Method0
Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits0
Active Reinforcement Learning: Observing Rewards at a Cost0
Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling0
Active Search for Sparse Signals with Region Sensing0
Active Velocity Estimation using Light Curtains via Self-Supervised Multi-Armed Bandits0
AdaLinUCB: Opportunistic Learning for Contextual Bandits0
AdaptEx: A Self-Service Contextual Bandit Platform0
Adapting Bandit Algorithms for Settings with Sequentially Available Arms0
Adapting to Delays and Data in Adversarial Multi-Armed Bandits0
Adapting to Misspecification in Contextual Bandits with Offline Regression Oracles0
Adapting to Misspecification in Contextual Bandits0
Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits0
Adaptive Budgeted Multi-Armed Bandits for IoT with Dynamic Resource Constraints0
Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems0
Adaptive Data Augmentation for Thompson Sampling0
Adaptive Discretization against an Adversary: Lipschitz bandits, Dynamic Pricing, and Auction Tuning0
Show:102550
← PrevPage 21 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified