SOTAVerified

The orchestration of Machine Learning frameworks with data streams and GPU acceleration in Kafka-ML: A deep-learning performance comparative

2023-03-15Expert Systems (Wiley) 2023Code Available1· sign in to hype

Antonio Jesús Chaves, Cristian Martín, Manuel Díaz

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Machine Learning (ML) applications need large volumes of data to train their models so that they can make high-quality predictions. Given digital revolution enablers such as the Internet of Things (IoT) and the Industry 4.0, this information is generated in large quantities in terms of continuous data streams and not in terms of static datasets as it is the case with most AI (Artificial Intelligence) frameworks. Kafka-ML is a novel open-source framework that allows the complete management of ML/AI pipelines through data streams. In this article, we present new features for the Kafka-ML framework, such as the support for the well-known ML/AI framework PyTorch, as well as for GPU acceleration at different points along the pipeline. This pipeline will be described by taking a real Industry 4.0 use case in the Petrochemical Industry. Finally, a comprehensive evaluation with state-of-the-art deep learning models will be carried out to demonstrate the feasibility of the platform.

Tasks

Reproductions