Unrolled Memory Inner-Products: An Abstract GPU Operator for Efficient Vision-Related Computations

2017-10-01ICCV 2017Unverified0· sign in to hype

Yu-Sheng Lin, Wei-Chao Chen, Shao-Yi Chien

Unverified — Be the first to reproduce this paper.

Abstract

Recently, convolutional neural networks (CNNs) have achieved great success in fields such as computer vision, natural language processing, and artificial intelligence. Many of these applications utilize parallel processing in GPUs to achieve higher performance. However, it remains a daunting task to optimize for GPUs, and most researchers have to rely on vendor-provided libraries for such purposes. In this paper, we discuss an operator that can be used to succinctly express computational kernels in CNNs and various scientific and vision applications. This operator, called Unrolled-Memory-Inner-Product (UMI), is a computationally-efficient operator with smaller code token requirement. Since a naive UMI implementation would increase memory requirement through input data unrolling, we propose a method to achieve optimal memory fetch performance in modern GPUs. We demonstrate this operator by converting several popular applications into the UMI representation and achieve 1.3x-26.4x speedup against frameworks such as OpenCV and Caffe.

Tasks

GPU Rolling Shutter Correction

Unrolled Memory Inner-Products: An Abstract GPU Operator for Efficient Vision-Related Computations

Abstract

Tasks

Reproductions