SOTAVerified

Multi-Armed Bandits

Multi-armed bandits refer to a task where a fixed amount of resources must be allocated between competing resources that maximizes expected gain. Typically these problems involve an exploration/exploitation trade-off.

( Image credit: Microsoft Research )

Papers

Showing 751775 of 1262 papers

TitleStatusHype
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits0
Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach0
Learning-Based User Association for MmWave Vehicular Networks With Kernelized Contextual Bandits0
Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect0
Learning Neural Contextual Bandits Through Perturbed Rewards0
Learning diverse rankings with multi-armed bandits0
Learning Effective Exploration Strategies For Contextual Bandits0
Learning How to Price Charging in Electric Ride-Hailing Markets0
Learning in Generalized Linear Contextual Bandits with Stochastic Delays0
Learning in Restless Multi-Armed Bandits via Adaptive Arm Sequencing Rules0
Learning Multiple Tasks in Parallel with a Shared Annotator0
Learning Personalized Decision Support Policies0
Learning to Actively Learn: A Robust Approach0
Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems0
Learning to Explore with Lagrangians for Bandits under Unknown Linear Constraints0
Learning to Optimize Energy Efficiency in Energy Harvesting Wireless Sensor Networks0
Learning to Rank in the Position Based Model with Bandit Feedback0
Learning to Search Better Than Your Teacher0
Learning to Use Learners' Advice0
Lenient Regret for Multi-Armed Bandits0
Lessons from Contextual Bandit Learning in a Customer Support Bot0
Leveraging (Biased) Information: Multi-armed Bandits with Offline Data0
Leveraging Good Representations in Linear Contextual Bandits0
Leveraging heterogeneous spillover in maximizing contextual bandit rewards0
Leveraging User-Triggered Supervision in Contextual Bandits0
Show:102550
← PrevPage 31 of 51Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1NeuralLinear FullPosterior-MRCumulative regret1.92Unverified
2Linear FullPosterior-MRCumulative regret1.82Unverified