Bridging Data Center AI Systems with Edge Computing for Actionable Information Retrieval

2021-05-28Code Available0· sign in to hype

Zhengchun Liu, Ahsan Ali, Peter Kenesei, Antonino Miceli, Hemant Sharma, Nicholas Schwarz, Dennis Trujillo, Hyunseung Yoo, Ryan Coffee, Naoufal Layad, Jana Thayer, Ryan Herbst, ChunHong Yoon, Ian Foster

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/AISDC/CookieNetAE
OfficialIn paperpytorch★ 3
github.com/AISDC/DNNTrainerFlow
OfficialIn paperpytorch★ 2

Abstract

Extremely high data rates at modern synchrotron and X-ray free-electron laser light source beamlines motivate the use of machine learning methods for data reduction, feature detection, and other purposes. Regardless of the application, the basic concept is the same: data collected in early stages of an experiment, data from past similar experiments, and/or data simulated for the upcoming experiment are used to train machine learning models that, in effect, learn specific characteristics of those data; these models are then used to process subsequent data more efficiently than would general-purpose models that lack knowledge of the specific dataset or data class. Thus, a key challenge is to be able to train models with sufficient rapidity that they can be deployed and used within useful timescales. We describe here how specialized data center AI (DCAI) systems can be used for this purpose through a geographically distributed workflow. Experiments show that although there are data movement cost and service overhead to use remote DCAI systems for DNN training, the turnaround time is still less than 1/30 of using a locally deploy-able GPU.

Tasks

BIG-bench Machine Learning Edge-computing GPU Information Retrieval Retrieval

Bridging Data Center AI Systems with Edge Computing for Actionable Information Retrieval

Code

Abstract

Tasks

Reproductions