Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

2017-04-11Code Available0· sign in to hype

Muhammad Zeshan Afzal, Andreas Kölsch, Sheraz Ahmed, Marcus Liwicki

Code Available — Be the first to reproduce this paper.

Code

github.com/tuannamnguyen93/DFKI_test_PhD
tf★ 0
github.com/iamarjunchandra/LayoutLM-Form-Understanding---Sequence-Labeling
pytorch★ 0
github.com/tuannamnguyen93/test_PhD
tf★ 0
github.com/BordiaS/layoutlm
pytorch★ 0
github.com/microsoft/unilm/tree/master/layoutlm
pytorch★ 0

Abstract

We present an exhaustive investigation of recent Deep Learning architectures, algorithms, and strategies for the task of document image classification to finally reduce the error by more than half. Existing approaches, such as the DeepDocClassifier, apply standard Convolutional Network architectures with transfer learning from the object recognition domain. The contribution of the paper is threefold: First, it investigates recently introduced very deep neural network architectures (GoogLeNet, VGG, ResNet) using transfer learning (from real images). Second, it proposes transfer learning from a huge set of document images, i.e. 400,000 documents. Third, it analyzes the impact of the amount of training data (document images) and other parameters to the classification abilities. We use two datasets, the Tobacco-3482 and the large-scale RVL-CDIP dataset. We achieve an accuracy of 91.13% for the Tobacco-3482 dataset while earlier approaches reach only 77.6%. Thus, a relative error reduction of more than 60% is achieved. For the large dataset RVL-CDIP, an accuracy of 90.97% is achieved, corresponding to a relative error reduction of 11.5%.

Tasks

document-image-classification Document Image Classification General Classification image-classification Image Classification Object Recognition Transfer Learning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
RVL-CDIP	Transfer Learning from AlexNet, VGG-16, GoogLeNet and ResNet50	Accuracy	90.97	—	Unverified

Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification

Code

Abstract

Tasks

Benchmark Results

Reproductions