SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 801850 of 1262 papers

TitleStatusHype
Meta-Learning surrogate models for sequential decision making0
Meta-Prompt Optimization for LLM-Based Sequential Decision Making0
Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models0
Meta-Thompson Sampling0
Metric-Free Individual Fairness with Cooperative Contextual Bandits0
Minimax Off-Policy Evaluation for Multi-Armed Bandits0
Minimax-optimal trust-aware multi-armed bandits0
Minimax Policy for Heavy-tailed Bandits0
Mitigating Bias in Adaptive Data Gathering via Differential Privacy0
Modeling Attrition in Recommender Systems with Departing Bandits0
Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits0
Modelling Cournot Games as Multi-agent Multi-armed Bandits0
Model selection for behavioral learning data and applications to contextual bandits0
Model Selection for Generic Contextual Bandits0
Model Selection in Contextual Stochastic Bandit Problems0
Model Selection in Reinforcement Learning with General Function Approximations0
Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis0
More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning0
More Robust Doubly Robust Off-policy Evaluation0
Mortal Multi-Armed Bandits0
Multi-agent Multi-armed Bandits with Stochastic Sharable Arm Capacities0
Multi-Agent Multi-Armed Bandits with Limited Communication0
Multi-agent Multi-armed Bandit with Fully Heavy-tailed Dynamics0
Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions0
Multi-armed Bandit Learning for TDMA Transmission Slot Scheduling and Defragmentation for Improved Bandwidth Usage0
Multi-Armed Bandits and Quantum Channel Oracles0
Multi-armed Bandits: Competing with Optimal Sequences0
Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback0
Multi-Armed Bandits for Intelligent Tutoring Systems0
Multi-armed Bandits for Link Configuration in Millimeter-wave Networks0
Multi-Armed Bandits for Minesweeper: Profiting from Exploration-Exploitation Synergy0
Multi-Armed Bandits in Metric Spaces0
Multi-Armed Bandits Meet Large Language Models0
Multi-armed bandits on implicit metric spaces0
Multi-Armed Bandits on Partially Revealed Unit Interval Graphs0
Multi-Armed Bandits with Abstention0
Multi-armed Bandits with Application to 5G Small Cells0
Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization0
Multi-Armed Bandits with Censored Consumption of Resources0
Multi-armed Bandits with Compensation0
Multi-armed Bandits with Cost Subsidy0
Multi-Armed Bandits with Dependent Arms0
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards0
Multi-Armed Bandits with Interference0
Multi-Armed Bandits with Local Differential Privacy0
Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards0
Multi-Armed Bandits with Metric Movement Costs0
Multi-Armed Bandits with Self-Information Rewards0
Multi-Fidelity Multi-Armed Bandits Revisited0
Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness0
Show:102550
← PrevPage 17 of 26Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified