SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 801825 of 1262 papers

TitleStatusHype
Meta-Learning surrogate models for sequential decision making0
Meta-Prompt Optimization for LLM-Based Sequential Decision Making0
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models0
Meta-Thompson Sampling0
Metric-Free Individual Fairness with Cooperative Contextual Bandits0
Minimax Off-Policy Evaluation for Multi-Armed Bandits0
Minimax-optimal trust-aware multi-armed bandits0
Minimax Policy for Heavy-tailed Bandits0
Mitigating Bias in Adaptive Data Gathering via Differential Privacy0
Modeling Attrition in Recommender Systems with Departing Bandits0
Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits0
Modelling Cournot Games as Multi-agent Multi-armed Bandits0
Model selection for behavioral learning data and applications to contextual bandits0
Model Selection for Generic Contextual Bandits0
Model Selection in Contextual Stochastic Bandit Problems0
Model Selection in Reinforcement Learning with General Function Approximations0
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis0
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning0
More Robust Doubly Robust Off-policy Evaluation0
Mortal Multi-Armed Bandits0
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities0
Multi-Agent Multi-Armed Bandits with Limited Communication0
Multi-agent Multi-armed Bandit with Fully Heavy-tailed Dynamics0
Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions0
Multi-armed Bandit Learning for TDMA Transmission Slot Scheduling and Defragmentation for Improved Bandwidth Usage0
Show:102550
← PrevPage 33 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified