SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 501550 of 1262 papers

TitleStatusHype
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees0
PAC-Bayesian Offline Contextual Bandits With Guarantees0
Conditionally Risk-Averse Contextual BanditsCode0
Fast Beam Alignment via Pure Exploration in Multi-armed BanditsCode0
Optimal Contextual Bandits with Knapsacks under Realizability via Regression OraclesCode0
Vertical Federated Linear Contextual Bandits0
Contextual bandits with concave rewards, and an application to fair ranking0
Simulated Contextual Bandits for Personalization Tasks from Recommendation DatasetsCode0
Maximum entropy exploration in contextual bandits with neural networks and energy based models0
Constant regret for sequence prediction with limited advice0
Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs0
ProtoBandit: Efficient Prototype Selection via Multi-Armed Bandits0
Replicable Bandits0
On Best-Arm Identification with a Fixed Budget in Non-Parametric Multi-Armed Bandits0
Off-Policy Risk Assessment in Markov Decision Processes0
Active Inference for Autonomous Decision-Making with Contextual Multi-Armed Bandits0
Towards Robust Off-Policy Evaluation via Human Inputs0
Constrained Policy Optimization for Controlled Self-Learning in Conversational AI Systems0
Risk-aware linear bandits with convex loss0
Double Doubly Robust Thompson Sampling for Generalized Linear Contextual Bandits0
Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health0
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits0
Multi-Armed Bandits with Self-Information Rewards0
Exposure-Aware Recommendation using Contextual Bandits0
Variational Inference for Model-Free and Model-Based Reinforcement Learning0
Dynamic Global Sensitivity for Differentially Private Contextual Bandits0
A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning0
Understanding the stochastic dynamics of sequential decision-making processes: A path-integral analysis of multi-armed bandits0
Increasing Students' Engagement to Reminder Emails Through Multi-Armed Bandits0
Nonstationary Continuum-Armed Bandit Strategies for Automated Trading in a Simulated Financial MarketCode0
Raising Student Completion Rates with Adaptive Curriculum and Contextual Bandits0
Towards Soft Fairness in Restless Multi-Armed Bandits0
SPRT-based Efficient Best Arm Identification in Stochastic Bandits0
Online Learning with Off-Policy Feedback0
Parallel Best Arm Identification in Heterogeneous Environments0
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action SpacesCode0
Contextual Bandits with Large Action Spaces: Made PracticalCode0
Online SuBmodular + SuPermodular (BP) Maximization with Bandit FeedbackCode0
Model Selection in Reinforcement Learning with General Function Approximations0
Instance-optimal PAC Algorithms for Contextual Bandits0
Autonomous Drug Design with Multi-Armed Bandits0
Ranking In Generalized Linear BanditsCode0
Two-Stage Neural Contextual Bandits for Personalised News RecommendationCode0
Joint Representation Training in Sequential Tasks with Shared Structure0
Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms0
On Private Online Convex Optimization: Optimal Algorithms in _p-Geometry and High Dimensional Contextual BanditsCode0
Combinatorial Pure Exploration of Causal Bandits0
A Contextual Combinatorial Semi-Bandit Approach to Network Bottleneck Identification0
Distributed Differential Privacy in Multi-Armed Bandits0
Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits0
Show:102550
← PrevPage 11 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified