SOTAVerified

Leveraging Diffusion Models for Synthetic Data Augmentation in Protein Subcellular Localization Classification

2025-05-28Unverified0· sign in to hype

Sylvey Lin, Zhi-Yi Cao

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We investigate whether synthetic images generated by diffusion models can enhance multi-label classification of protein subcellular localization. Specifically, we implement a simplified class-conditional denoising diffusion probabilistic model (DDPM) to produce label-consistent samples and explore their integration with real data via two hybrid training strategies: Mix Loss and Mix Representation. While these approaches yield promising validation performance, our proposed MixModel exhibits poor generalization to unseen test data, underscoring the challenges of leveraging synthetic data effectively. In contrast, baseline classifiers built on ResNet backbones with conventional loss functions demonstrate greater stability and test-time performance. Our findings highlight the importance of realistic data generation and robust supervision when incorporating generative augmentation into biomedical image classification.

Tasks

Reproductions