Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

2014-06-18Code Available0· sign in to hype

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun

Code Available — Be the first to reproduce this paper.

Code

github.com/yueruchen/sppnet-pytorch
pytorch★ 327
github.com/daveboat/spp
pytorch★ 3
github.com/zhanghuiyao/yolov5_mindspore
mindspore★ 1
github.com/addisonklinke/pytorch-architectures/blob/master/torcharch/conv.py
pytorch★ 0
github.com/tdhock/feature-learning-benchmark
none★ 0
github.com/Mind23-2/MindCode-88/tree/main/SPPNet
mindspore★ 0
github.com/xiamenwcy/extended-caffe
none★ 0
github.com/alegonz/kdsb17
tf★ 0
github.com/mindspore-ai/models/blob/master/research/cv/SPPNet
mindspore★ 0
github.com/yangyucheng000/ms_cv/tree/main/SPPNet
mindspore★ 0

Abstract

Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224x224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with another pooling strategy, "spatial pyramid pooling", to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. Pyramid pooling is also robust to object deformations. With these advantages, SPP-net should in general improve all CNN-based image classification methods. On the ImageNet 2012 dataset, we demonstrate that SPP-net boosts the accuracy of a variety of CNN architectures despite their different designs. On the Pascal VOC 2007 and Caltech101 datasets, SPP-net achieves state-of-the-art classification results using a single full-image representation and no fine-tuning. The power of SPP-net is also significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method is 24-102x faster than the R-CNN method, while achieving better or comparable accuracy on Pascal VOC 2007. In ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2014, our methods rank #2 in object detection and #3 in image classification among all 38 teams. This manuscript also introduces the improvement made for this competition.

Tasks

General Classification image-classification Image Classification object-detection Object Detection Object Recognition

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
PASCAL VOC 2007	SPP(combination)	MAP	60.9	—	Unverified

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Code

Abstract

Tasks

Benchmark Results

Reproductions