SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 12011250 of 1262 papers

TitleStatusHype
Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States0
Clustered Linear Contextual Bandits with Knapsacks0
COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents0
Parallel Best Arm Identification in Heterogeneous Environments0
Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits0
Collaborative Min-Max Regret in Grouped Multi-Armed Bandits0
Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits0
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits0
Top-k Combinatorial Bandits with Full-Bandit Feedback0
Bayesian Analysis of Combinatorial Gaussian Process Bandits0
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing0
Combinatorial Multi-armed Bandits for Real-Time Strategy Games0
Combinatorial Multi-Armed Bandits with Filtered Feedback0
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond0
Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards0
Combinatorial Pure Exploration of Multi-Armed Bandits0
Combinatorial Pure Exploration with Full-bandit Feedback and Beyond: Solving Combinatorial Optimization under Uncertainty with Limited Observation0
Combinatorial Semi-Bandits with Knapsacks0
Combining Difficulty Ranking with Multi-Armed Bandits to Sequence Educational Content0
Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support0
Communication Efficient Distributed Learning for Kernelized Contextual Bandits0
Comparative Performance of Collaborative Bandit Algorithms: Effect of Sparsity and Exploration Intensity0
Competing Bandits in Matching Markets0
Competing Bandits: The Perils of Exploration Under Competition0
Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs0
Concurrent Decentralized Channel Allocation and Access Point Selection using Multi-Armed Bandits in multi BSS WLANs0
Confidence-Budget Matching for Sequential Budgeted Learning0
Conformal Off-Policy Prediction in Contextual Bandits0
Conservative Contextual Bandits: Beyond Linear Representations0
Constant regret for sequence prediction with limited advice0
Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems0
Constrained Pure Exploration Multi-Armed Bandits with a Fixed Budget0
Context-Aware Bandits0
Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms0
Contextual Bandit Applications in Customer Support Bot0
Contextual Bandits and Imitation Learning via Preference-Based Active Queries0
Contextual Bandits and Optimistically Universal Learning0
Contextual Bandits Evolving Over Finite Time0
Contextual Bandits for adapting to changing User preferences over time0
Contextual Bandits for Advertising Budget Allocation0
Contextual Bandits for Advertising Campaigns: A Diffusion-Model Independent Approach (Extended Version)0
Contextual Bandits for Evaluating and Improving Inventory Control Policies0
Contextual Bandits for Unbounded Context Distributions0
Contextual Bandits in a Survey Experiment on Charitable Giving: Within-Experiment Outcomes versus Policy Learning0
Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning at Adyen0
Linear Bandits with Stochastic Delayed Feedback0
Contextual Bandits with Arm Request Costs and Delays0
Contextual Bandits with Budgeted Information Reveal0
Contextual bandits with concave rewards, and an application to fair ranking0
Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting0
Show:102550
← PrevPage 25 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified