Learning Spatial Pyramid Attentive Pooling in Image Synthesis and Image-to-Image Translation
Wei Sun, Tianfu Wu
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Image synthesis and image-to-image translation are two important generative learning tasks. Remarkable progress has been made by learning Generative Adversarial Networks (GANs)~goodfellow2014generative and cycle-consistent GANs (CycleGANs)~zhu2017unpaired respectively. This paper presents a method of learning Spatial Pyramid Attentive Pooling (SPAP) which is a novel architectural unit and can be easily integrated into both generators and discriminators in GANs and CycleGANs. The proposed SPAP integrates Atrous spatial pyramid~chen2018deeplab, a proposed cascade attention mechanism and residual connections~he2016deep. It leverages the advantages of the three components to facilitate effective end-to-end generative learning: (i) the capability of fusing multi-scale information by ASPP; (ii) the capability of capturing relative importance between both spatial locations (especially multi-scale context) or feature channels by attention; (iii) the capability of preserving information and enhancing optimization feasibility by residual connections. Coarse-to-fine and fine-to-coarse SPAP are studied and intriguing attention maps are observed in both tasks. In experiments, the proposed SPAP is tested in GANs on the Celeba-HQ-128 dataset~karras2017progressive, and tested in CycleGANs on the Image-to-Image translation datasets including the Cityscape dataset~cordts2016cityscapes, Facade and Aerial Maps dataset~zhu2017unpaired, both obtaining better performance.