SDAFE: A Dual-filter Stable Diffusion Data Augmentation Method for Facial Expression Recognition

2025-04-06ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025Unverified0· sign in to hype

Minghao Zhao, Yifei Chen, Jiahao Lyu, Shuangli Du, Zhiyong Lv, Lin Wang

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Facial expressions are a powerful medium for conveying emotions. In facial expression recognition (FER) field, the difficulty of collecting specific expressions often leads to class imbalance in mainstream datasets, significantly reducing the classification accuracy of deep neural networks. To address these issues, we propose a stable-diffusion-based augmentation method for facial expression (SDAFE) that resolves class imbalance problems and enhances data generation quality through cross-modal label guidance. By leveraging the neutrality of neutral faces, we generate additional expressions to balance the dataset classes. We introduce a peak signal-to-noise ratio (PSNR) filter to ensure the high quality of the generated images and a cosine similarity cross-modal filter based on CLIP encoders to ensure that the content of the generated images accurately aligns with their labels. Furthermore, we introduce a novel model, FERNeXt, which demonstrates outstanding performance in FER tasks, surpassing the state-of-the-art accuracy on the FER2013 dataset and achieving strong results on the RAF-DB and NHFI datasets. Subsequently, the performance of several models across different datasets significantly improves through the use of SDAFE in our experiments.

Tasks

Data Augmentation Facial Expression Recognition Facial Expression Recognition (FER)

SDAFE: A Dual-filter Stable Diffusion Data Augmentation Method for Facial Expression Recognition

Abstract

Tasks

Reproductions