SOTAVerified

Memory Planning for Deep Neural Networks

2022-02-23Unverified0· sign in to hype

Maksim Levental

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We study memory allocation patterns in DNNs during inference, in the context of large-scale systems. We observe that such memory allocation patterns, in the context of multi-threading, are subject to high latencies, due to mutex contention in the system memory allocator. Latencies incurred due to such mutex contention produce undesirable bottlenecks in user-facing services. Thus, we propose a "memorization" based technique, MemoMalloc, for optimizing overall latency, with only moderate increases in peak memory usage. Specifically, our technique consists of a runtime component, which captures all allocations and uniquely associates them with their high-level source operation, and a static analysis component, which constructs an efficient allocation "plan". We present an implementation of MemoMalloc in the PyTorch deep learning framework and evaluate memory consumption and execution performance on a wide range of DNN architectures. We find that MemoMalloc outperforms state-of-the-art general purpose memory allocators, with respect to DNN inference latency, by as much as 40\%.

Tasks

Reproductions