SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 601625 of 1262 papers

TitleStatusHype
Instance-optimal PAC Algorithms for Contextual Bandits0
Concentrated Differential Privacy for Bandits0
Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward0
BanditRank: Learning to Rank Using Contextual Bandits0
A conversion theorem and minimax optimality for continuum contextual bandits0
Learning to Use Learners' Advice0
Contextual Information-Directed Sampling0
Investigating Gender Fairness in Machine Learning-driven Personalized Care for Chronic Pain0
Contextual Linear Bandits with Delay as Payoff0
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement0
Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?0
Is Reinforcement Learning More Difficult Than Bandits? A Near-optimal Algorithm Escaping the Curse of Horizon0
Joint Representation Training in Sequential Tasks with Shared Structure0
Contextual Multi-Armed Bandits for Causal Marketing0
Kernel-based Multi-Task Contextual Bandits in Cellular Network Configuration0
Bandits Don’t Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits0
Kernel ε-Greedy for Multi-Armed Bandits with Covariates0
Kernel Methods for Cooperative Multi-Agent Contextual Bandits0
KL-regularization Itself is Differentially Private in Bandits and RLHF0
Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits0
Contextual Pandora's Box0
Lagrangian Index Policy for Restless Bandits with Average Reward0
Confidence-Budget Matching for Sequential Budgeted Learning0
Langevin Thompson Sampling with Logarithmic Communication: Bandits and Reinforcement Learning0
From Bandits to Experts: A Tale of Domination and Independence0
Show:102550
← PrevPage 25 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified