SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 151200 of 1262 papers

TitleStatusHype
A Convex Framework for Confounding Robust InferenceCode0
Contextual bandits with entropy-based human feedbackCode0
Contextual Bandits with Stochastic ExpertsCode0
Efficient Explorative Key-term Selection Strategies for Conversational Contextual BanditsCode0
Adaptive Data Depth via Multi-Armed BanditsCode0
Empirical Likelihood for Contextual BanditsCode0
RoME: A Robust Mixed-Effects Bandit Algorithm for Optimizing Mobile Health InterventionsCode0
Estimation of Warfarin Dosage with Reinforcement LearningCode0
Flooding with Absorption: An Efficient Protocol for Heterogeneous Bandits over Complex NetworksCode0
Federated Multi-armed Bandits with PersonalizationCode0
Finding All ε-Good Arms in Stochastic BanditsCode0
Finite-time Analysis of Globally Nonstationary Multi-Armed BanditsCode0
Conditionally Risk-Averse Contextual BanditsCode0
Gaussian Gated Linear NetworksCode0
Group Meritocratic Fairness in Linear Contextual BanditsCode0
Batched Multi-armed Bandits ProblemCode0
Combinatorial Multi-armed Bandits for Resource AllocationCode0
Combinatorial Bandits under Strategic ManipulationsCode0
Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous AgentsCode0
Hierarchical Multi-Armed Bandits for the Concurrent Intelligent Tutoring of Concepts and Problems of Varying Difficulty LevelsCode0
Confidence Intervals for Policy Evaluation in Adaptive ExperimentsCode0
Identification of the Generalized Condorcet Winner in Multi-dueling BanditsCode0
Causal Contextual Bandits with Adaptive ContextCode0
Cascading Bandits for Large-Scale Recommendation ProblemsCode0
Introduction to Multi-Armed BanditsCode0
Bayesian Design Principles for Frequentist Sequential LearningCode0
Bayesian Optimisation over Multiple Continuous and Categorical InputsCode0
Inverse Contextual Bandits: Learning How Behavior Evolves over TimeCode0
Scalable Exploration via Ensemble++Code0
Kernel Conditional Moment Constraints for Confounding Robust InferenceCode0
Causally Abstracted Multi-armed BanditsCode0
Budgeted Multi-Armed Bandits with Asymmetric Confidence IntervalsCode0
Nonstationary Continuum-Armed Bandit Strategies for Automated Trading in a Simulated Financial MarketCode0
Let's Get It Started: Fostering the Discoverability of New Releases on DeezerCode0
Model selection for contextual banditsCode0
Locally Differentially Private (Contextual) Bandits LearningCode0
An Empirical Evaluation of Federated Contextual Bandit AlgorithmsCode0
Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace RecoveryCode0
Best Arm Identification with Fixed Budget: A Large Deviation PerspectiveCode0
Adaptive Linear Estimating EquationsCode0
Censored Semi-Bandits: A Framework for Resource Allocation with Censored FeedbackCode0
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision ProcessesCode0
Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewardsCode0
More Robust Doubly Robust Off-policy EvaluationCode0
Bandit-Based Monte Carlo Optimization for Nearest NeighborsCode0
Multi-agent Multi-armed Bandits with Minimum Reward Guarantee FairnessCode0
An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed BanditsCode0
Multi-Armed Bandits in Brain-Computer InterfacesCode0
Confident Off-Policy Evaluation and Selection through Self-Normalized Importance WeightingCode0
Decentralized Cooperative Stochastic BanditsCode0
Show:102550
← PrevPage 4 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified