SDAFE: A Dual-filter Stable Diffusion Data Augmentation Method for Facial Expression Recognition
Minghao Zhao, Yifei Chen, Jiahao Lyu, Shuangli Du, Zhiyong Lv, Lin Wang
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Facial expressions are a powerful medium for conveying emotions. In facial expression recognition (FER) field, the difficulty of collecting specific expressions often leads to class imbalance in mainstream datasets, significantly reducing the classification accuracy of deep neural networks. To address these issues, we propose a stable-diffusion-based augmentation method for facial expression (SDAFE) that resolves class imbalance problems and enhances data generation quality through cross-modal label guidance. By leveraging the neutrality of neutral faces, we generate additional expressions to balance the dataset classes. We introduce a peak signal-to-noise ratio (PSNR) filter to ensure the high quality of the generated images and a cosine similarity cross-modal filter based on CLIP encoders to ensure that the content of the generated images accurately aligns with their labels. Furthermore, we introduce a novel model, FERNeXt, which demonstrates outstanding performance in FER tasks, surpassing the state-of-the-art accuracy on the FER2013 dataset and achieving strong results on the RAF-DB and NHFI datasets. Subsequently, the performance of several models across different datasets significantly improves through the use of SDAFE in our experiments.