SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 751800 of 1262 papers

TitleStatusHype
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits0
Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach0
Learning-Based User Association for MmWave Vehicular Networks With Kernelized Contextual Bandits0
Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect0
Learning Neural Contextual Bandits Through Perturbed Rewards0
Learning diverse rankings with multi-armed bandits0
Learning Effective Exploration Strategies For Contextual Bandits0
Learning How to Price Charging in Electric Ride-Hailing Markets0
Learning in Generalized Linear Contextual Bandits with Stochastic Delays0
Learning in Restless Multi-Armed Bandits via Adaptive Arm Sequencing Rules0
Learning Multiple Tasks in Parallel with a Shared Annotator0
Learning Personalized Decision Support Policies0
Learning to Actively Learn: A Robust Approach0
Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems0
Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints0
Learning to Optimize Energy Efficiency in Energy Harvesting Wireless Sensor Networks0
Learning to Rank in the Position Based Model with Bandit Feedback0
Learning to Search Better Than Your Teacher0
Learning to Use Learners' Advice0
Lenient Regret for Multi-Armed Bandits0
Lessons from Contextual Bandit Learning in a Customer Support Bot0
Leveraging (Biased) Information: Multi-armed Bandits with Offline Data0
Leveraging Good Representations in Linear Contextual Bandits0
Leveraging heterogeneous spillover in maximizing contextual bandit rewards0
Leveraging User-Triggered Supervision in Contextual Bandits0
Lifelong Learning in Multi-Armed Bandits0
Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits0
lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits0
Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design0
Linear Contextual Bandits with Adversarial Corruptions0
Linear Contextual Bandits with Interference0
Linear Contextual Bandits with Knapsacks0
Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms0
LLMs-augmented Contextual Bandit0
Local Clustering in Contextual Multi-Armed Bandits0
Local Differential Privacy for Sequential Decision Making in a Changing Environment0
(Locally) Differentially Private Combinatorial Semi-Bandits0
Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits0
Making Contextual Decisions with Low Technical Debt0
Mathematics of statistical sequential decision-making: concentration, risk-awareness and modelling in stochastic bandits, with applications to bariatric surgery0
Maximum entropy exploration in contextual bandits with neural networks and energy based models0
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization0
Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations0
MBExplainer: Multilevel bandit-based explanations for downstream models with augmented graph embeddings0
Achieving PAC Guarantees in Mechanism Design through Multi-Armed Bandits0
Meet Me at the Arm: The Cooperative Multi-Armed Bandits Problem with Shareable Arms0
Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models0
Meta-learners' learning dynamics are unlike learners'0
Meta-Learning Adversarial Bandit Algorithms0
Meta-Learning Adversarial Bandits0
Show:102550
← PrevPage 16 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified