SOTAVerified

Hierarchical Roofline Performance Analysis for Deep Learning Applications

2020-09-11Code Available0· sign in to hype

Charlene Yang, Yunsong Wang, Steven Farrell, Thorsten Kurth, Samuel Williams

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

This paper presents a practical methodology for collecting performance data necessary to conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the Empirical Roofline Toolkit for broader support of a range of data precisions and Tensor Core support and introduces a Nsight Compute based method to accurately collect application performance information. This methodology allows for automated machine characterization and application characterization for Roofline analysis across the entire memory hierarchy on NVIDIA GPUs, and it is validated by a complex deep learning application used for climate image segmentation. We use two versions of the code, in TensorFlow and PyTorch respectively, to demonstrate the use and effectiveness of this methodology. We highlight how the application utilizes the compute and memory capabilities on the GPU and how the implementation and performance differ in two deep learning frameworks.

Tasks

Reproductions