SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 201250 of 1262 papers

TitleStatusHype
Boundary Crossing Probabilities for General Exponential Families0
Bounded Regret for Finitely Parameterized Multi-Armed Bandits0
Breaking the (1/Δ_2) Barrier: Better Batched Best Arm Identification with Adaptive Grids0
Breaking the T Barrier: Instance-Independent Logarithmic Regret in Stochastic Contextual Linear Bandits0
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism0
Budget-Constrained Multi-Armed Bandits with Multiple Plays0
Budgeted Combinatorial Multi-Armed Bandits0
An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays0
Budgeted Recommendation with Delayed Feedback0
Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens0
Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability0
Bypassing the Simulator: Near-Optimal Adversarial Linear Contextual Bandits0
Byzantine-Resilient Decentralized Multi-Armed Bandits0
A Gang of Bandits0
An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints0
Catoni Contextual Bandits are Robust to Heavy-tailed Rewards0
Causal Bandits: Online Decision-Making in Endogenous Settings0
A General Reduction for High-Probability Analysis with General Light-Tailed Distributions0
Causal Contextual Bandits with Targeted Interventions0
Causal Feature Selection Method for Contextual Multi-Armed Bandits in Recommender System0
AdaLinUCB: Opportunistic Learning for Contextual Bandits0
Contextual Bandits in Payment Processing: Non-uniform Exploration and Supervised Learning at Adyen0
A framework for optimizing COVID-19 testing policy using a Multi Armed Bandit approach0
Classical Bandit Algorithms for Entanglement Detection in Parameterized Qubit States0
Clustered Linear Contextual Bandits with Knapsacks0
COBRA: Contextual Bandit Algorithm for Ensuring Truthful Strategic Agents0
Parallel Best Arm Identification in Heterogeneous Environments0
Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits0
Collaborative Min-Max Regret in Grouped Multi-Armed Bandits0
Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits0
Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits0
Adversarial Attacks on Adversarial Bandits0
Top-k Combinatorial Bandits with Full-Bandit Feedback0
Bayesian Analysis of Combinatorial Gaussian Process Bandits0
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing0
A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints0
Combinatorial Multi-armed Bandits for Real-Time Strategy Games0
Combinatorial Multi-Armed Bandits with Filtered Feedback0
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond0
Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards0
Combinatorial Pure Exploration of Multi-Armed Bandits0
Combinatorial Pure Exploration with Full-bandit Feedback and Beyond: Solving Combinatorial Optimization under Uncertainty with Limited Observation0
Combinatorial Semi-Bandits with Knapsacks0
Combining Difficulty Ranking with Multi-Armed Bandits to Sequence Educational Content0
A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity0
Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support0
Adversarial Bandits with Knapsacks0
Communication Efficient Distributed Learning for Kernelized Contextual Bandits0
Comparative Performance of Collaborative Bandit Algorithms: Effect of Sparsity and Exploration Intensity0
Balanced Linear Contextual Bandits0
Show:102550
← PrevPage 5 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified