Scene Text Image Super-Resolution via Parallelly Contextual Attention Network
Zhao Cairong, Feng Shuyang, Zhao Brian Nlong, Ding Zhijun, Wu Jun, Shen Fumin, Shen Heng Tao
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/Vill-Lab/2021-ACMMM-PCANOfficialpytorch★ 2
Abstract
Optical degradation makes text shapes and edges blurred, so the existing scene text recognition methods are difficult to achieve desirable results on low-resolution (LR) scene text images acquired in natural scenes. Therefore, efficiently extracting the sequential information for reconstructing super-resolution (SR) text images is challenging. In this paper, we propose a Parallel Contextual Attention Network (PCAN), which can effectively learn sequence-dependent features and focus more on high-frequency information of the reconstruction in text images. First, we explore the importance of sequence-dependent features in horizontal and vertical directions parallelly for text SR, and design a parallel contextual attention block to adaptively select the key information in the text sequence that contributes to image reconstruction. Secondly, we propose a Hierarchically orthogonal texture-aware attention module and an edge guidance loss function, which can help to reconstruct high-frequency information in text images. Finally, we conduct extensive experiments on different test sets of TextZoom, and the results can easily incorporate into mainstream text algorithms to further improve their performance in LR image recognition. Compared with the SR images obtained by BICUBIC up-sampling, our method can respectively improve the recognition accuracy of ASTER, MORAN, and CRNN by 14.29%, 14.35%, and 20.60%. Besides, our PCAN outperforms 8 state-of-the-art (SOTA) SR methods in improving the recognition performance of LR images. Most importantly, it outperforms the current optimal text-orient SR method TSRN by 3.19%, 3.65%, and 6.0% on the recognition accuracy of ASTER, MORAN, and CRNN respectively.