Multi-modal Page Stream Segmentation with Convolutional Neural Networks

2019-09-27Lang Resources & Evaluation 2019Code Available0· sign in to hype

Gregor Wiedemann, Gerhard Heyer

Code Available — Be the first to reproduce this paper.

Code

github.com/uhh-lt/pss-lrev
In papernone★ 0

Abstract

In recent years, (retro-)digitizing paper-based files became a major undertaking for private and public archives as well as an important task in electronic mailroom applications. As first steps, the workflow usually involves batch scanning and optical character recognition (OCR) of documents. In the case of multi-page documents, the preservation of document contexts is a major requirement. To facilitate workflows involving very large amounts of paper scans, page stream segmentation (PSS) is the task to automatically separate a stream of scanned images into coherent multi-page documents. In a digitization project together with a German federal archive, we developed a novel approach for PSS based on convolutional neural networks (CNN). As a first project, we combine visual information from scanned images with semantic information from OCR-ed texts for this task. The multi-modal combination of features in a single classification architecture allows for major improvements towards optimal document separation. Further to multimodality, our PSS approach profits from transfer-learning and sequential page modeling. We achieve accuracy up to 95% on multi-page documents on our in-house dataset and up to 93% on a publicly available dataset.

Tasks

Optical Character Recognition Optical Character Recognition (OCR)Page Stream Segmentation Transfer Learning

Multi-modal Page Stream Segmentation with Convolutional Neural Networks

Code

Abstract

Tasks

Reproductions