SOTAVerified

Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning

2025-04-29Unverified0· sign in to hype

Jinsun Yoo, ChonLam Lao, Lianjie Cao, Bob Lantz, Minlan Yu, Tushar Krishna, Puneet Sharma

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

This paper lays the foundation for Genie, a testing framework that captures the impact of real hardware network behavior on ML workload performance, without requiring expensive GPUs. Genie uses CPU-initiated traffic over a hardware testbed to emulate GPU to GPU communication, and adapts the ASTRA-sim simulator to model interaction between the network and the ML workload.

Tasks

Reproductions