Queueing Matching Bandits with Preference Feedback

2024-10-14Code Available0· sign in to hype

Jung-hun Kim, Min-hwan Oh

Code Available — Be the first to reproduce this paper.

Code

github.com/junghunkim7786/queueing-matching-bandits-with-preference-feedback
OfficialIn papernone★ 0

Abstract

In this study, we consider multi-class multi-server asymmetric queueing systems consisting of N queues on one side and K servers on the other side, where jobs randomly arrive in queues at each time. The service rate of each job-server assignment is unknown and modeled by a feature-based Multi-nomial Logit (MNL) function. At each time, a scheduler assigns jobs to servers, and each server stochastically serves at most one job based on its preferences over the assigned jobs. The primary goal of the algorithm is to stabilize the queues in the system while learning the service rates of servers. To achieve this goal, we propose algorithms based on UCB and Thompson Sampling, which achieve system stability with an average queue length bound of O( ,K\/) for a large time horizon T, where is a traffic slackness of the system. Furthermore, the algorithms achieve sublinear regret bounds of O( Q_,T^3/4\), where Q_ represents the maximum queue length over agents and times. Lastly, we provide experimental results to demonstrate the performance of our algorithms.

Tasks

Thompson Sampling

Queueing Matching Bandits with Preference Feedback

Code

Abstract

Tasks

Reproductions