SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 726750 of 1262 papers

TitleStatusHype
Incentivized Exploration for Multi-Armed Bandits under Reward Drift0
Incentivized Exploration via Filtered Posterior Sampling0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits0
Indexed Minimum Empirical Divergence-Based Algorithms for Linear Bandits0
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits0
Individual Regret in Cooperative Stochastic Multi-Armed Bandits0
In-Domain African Languages Translation Using LLMs and Multi-armed Bandits0
Inference for Batched Bandits0
Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective0
Instance-optimal PAC Algorithms for Contextual Bandits0
Concentrated Differential Privacy for Bandits0
Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain0
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement0
Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?0
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon0
Joint Representation Training in Sequential Tasks with Shared Structure0
Kernel-based Multi-Task Contextual Bandits in Cellular Network Configuration0
Kernel ε-Greedy for Multi-Armed Bandits with Covariates0
Kernel Methods for Cooperative Multi-Agent Contextual Bandits0
KL-regularization Itself is Differentially Private in Bandits and RLHF0
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits0
Lagrangian Index Policy for Restless Bandits with Average Reward0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
Latent Contextual Bandits and their Application to Personalized Recommendations for New Users0
Show:102550
← PrevPage 30 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified