SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 701750 of 1262 papers

TitleStatusHype
High-dimensional Nonparametric Contextual Bandit Problem0
High Probability Bound for Cross-Learning Contextual Bandits with Unknown Context Distributions0
Encrypted Linear Contextual Bandit0
Honor Among Bandits: No-Regret Learning for Online Fair Division0
Horde of Bandits using Gaussian Markov Random Fields0
How Does Variance Shape the Regret in Contextual Bandits?0
Human-AI Learning Performance in Multi-Armed Bandits0
Hypothesis Transfer in Bandits by Weighted Models0
Identifiable latent bandits: Combining observational data and exploration for personalized healthcare0
Imitation-Regularized Offline Learning0
Imprecise Multi-Armed Bandits0
Improved Algorithms for Adversarial Bandits with Unbounded Losses0
Improved Algorithms for Misspecified Linear Markov Decision Processes0
Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback0
Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms0
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs0
Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing0
A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit0
Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards0
Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits0
Improving Fairness in Adaptive Social Exergames via Shapley Bandits0
Improving Offline Contextual Bandits with Distributional Robustness0
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Incentivising Exploration and Recommendations for Contextual Bandits with Payments0
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Incentivized Exploration via Filtered Posterior Sampling0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits0
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits0
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits0
Individual Regret in Cooperative Stochastic Multi-Armed Bandits0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Inference for Batched Bandits0
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective0
Instance-optimal PAC Algorithms for Contextual Bandits0
Concentrated Differential Privacy for Bandits0
Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain0
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement0
Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?0
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon0
Joint Representation Training in Sequential Tasks with Shared Structure0
Kernel-based Multi-Task Contextual Bandits in Cellular Network Configuration0
Kernel ε-Greedy for Multi-Armed Bandits with Covariates0
Kernel Methods for Cooperative Multi-Agent Contextual Bandits0
KL-regularization Itself is Differentially Private in Bandits and RLHF0
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits0
Lagrangian Index Policy for Restless Bandits with Average Reward0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Latent Contextual Bandits and their Application to Personalized Recommendations for New Users0
Show:102550
← PrevPage 15 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified