SOTAVerified

The YOLO model that still excels in document layout analysis

2023-08-22ResearchGate 2023Unverified0· sign in to hype

Qilin Deng, Mayire Ibrayim, Askar Hamdulla, Chunhu Zhang

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Document layout analysis can help people better understand and use the information in a document. However, the diversity of document layouts and considerable variation in aspect ratios among document objects pose significant challenges. In this study, we designed the Multi-Convolutional Deformable Separation (MCDS) module as the main structure of the network, using the YOLO model as a baseline. Integration of this module into the Backbone and Neck layers enhances the image feature extraction process significantly. Moreover, we incorporate ParNet-Attention to direct the network's focus toward document objects through parallel networks, thereby facilitating a more exhaustive feature extraction. To optimize the model's predictive potential, the Decouple Fusion Head (DFH) is employed within the Head layer. This technique leverages multi-scale features based on the decoupled head, thereby enhancing the accuracy of predictions. Our proposed model achieves remarkable performance on three distinct public datasets with varying characteristics, namely ICDAR-POD, PubLayNet, and IIIT-AR-13K. Notably, in ICDAR-POD, both IoU_0.6 and IoU_0.8 achieve the optimal mean Average Precision (mAP), 96.2 and 94.4, respectively.

Tasks

Reproductions