SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 826850 of 1262 papers

TitleStatusHype
Multi-Armed Bandits and Quantum Channel Oracles0
Multi-armed Bandits: Competing with Optimal Sequences0
Multi-Armed Bandits for Correlated Markovian Environments with Smoothed Reward Feedback0
Multi-Armed Bandits for Intelligent Tutoring Systems0
Multi-armed Bandits for Link Configuration in Millimeter-wave Networks0
Multi-Armed Bandits for Minesweeper: Profiting from Exploration-Exploitation Synergy0
Multi-Armed Bandits in Metric Spaces0
Multi-Armed Bandits Meet Large Language Models0
Multi-armed bandits on implicit metric spaces0
Multi-Armed Bandits on Partially Revealed Unit Interval Graphs0
Multi-Armed Bandits with Abstention0
Multi-armed Bandits with Application to 5G Small Cells0
Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization0
Multi-Armed Bandits with Censored Consumption of Resources0
Multi-armed Bandits with Compensation0
Multi-armed Bandits with Cost Subsidy0
Multi-Armed Bandits with Dependent Arms0
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards0
Multi-Armed Bandits with Interference0
Multi-Armed Bandits with Local Differential Privacy0
Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards0
Multi-Armed Bandits with Metric Movement Costs0
Multi-Armed Bandits with Self-Information Rewards0
Multi-Fidelity Multi-Armed Bandits Revisited0
Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness0
Show:102550
← PrevPage 34 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified