SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 151200 of 1262 papers

TitleStatusHype
Linear Contextual Bandits with Interference0
Second Order Bounds for Contextual Bandits with Function Approximation0
Designing an Interpretable Interface for Contextual Bandits0
Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System0
Partially Observable Contextual Bandits with Linear Payoffs0
Batched Online Contextual Sparse Bandits with Sequential Inclusion of Features0
Batch Ensemble for Variance Dependent Regret in Stochastic Bandits0
A Hybrid Meta-Learning and Multi-Armed Bandit Approach for Context-Specific Multi-Objective Recommendation Optimization0
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis0
Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes0
Faster Q-Learning Algorithms for Restless Bandits0
Performance-Aware Self-Configurable Multi-Agent Networks: A Distributed Submodular Approach for Simultaneous Coordination and Network DesignCode0
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits0
Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications0
Representative Arm Identification: A fixed confidence approach to identify cluster representatives0
Online Fair Division with Contextual Bandits0
Dynamic Product Image Generation and Recommendation at Scale for Personalized E-commerce0
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards0
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities0
Contextual Bandits for Unbounded Context Distributions0
GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits0
Reciprocal Learning0
Hierarchical Multi-Armed Bandits for the Concurrent Intelligent Tutoring of Concepts and Problems of Varying Difficulty LevelsCode0
Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading BanditsCode0
Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous AgentsCode0
Empathic Responding for Digital Interpersonal Emotion Regulation via Content Recommendation0
Online Learning for Autonomous Management of Intent-based 6G Networks0
Identifiable latent bandits: Combining observational data and exploration for personalized healthcare0
Scalable Exploration via Ensemble++Code0
Satisficing Exploration for Deep Reinforcement Learning0
Open Problem: Tight Bounds for Kernelized Multi-Armed Bandits with Bernoulli Rewards0
On Speeding Up Language Model Evaluation0
Honor Among Bandits: No-Regret Learning for Online Fair Division0
A Contextual Combinatorial Bandit Approach to Negotiation0
Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States0
Jump Starting Bandits with LLM-Generated Prior KnowledgeCode0
EduQate: Generating Adaptive Curricula through RMABs in Education Settings0
BEACON: Balancing Convenience and Nutrition in Meals With Long-Term Group Recommendations and Reasoning on Multimodal Recipes0
Towards Bayesian Data Selection0
Discovering Minimal Reinforcement Learning EnvironmentsCode1
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions0
An Adaptive Method for Contextual Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System0
Linear Contextual Bandits with Hybrid Payoff: RevisitedCode0
Towards Domain Adaptive Neural Contextual Bandits0
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation0
Asymptotically Optimal Regret for Black-Box Predict-then-Optimize0
Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning0
A conversion theorem and minimax optimality for continuum contextual bandits0
Data-Driven Upper Confidence Bounds with Near-Optimal Regret for Heavy-Tailed Bandits0
Adaptively Learning to Select-Rank in Online Platforms0
Show:102550
← PrevPage 4 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified