Improving Acoustic Scene Classification in Low-Resource Conditions

2024-12-30Unverified0· sign in to hype

Zhi Chen, Yun-Fei Shao, Yong Ma, Mingsheng Wei, Le Zhang, Wei-Qiang Zhang

Unverified — Be the first to reproduce this paper.

Abstract

Acoustic Scene Classification (ASC) identifies an environment based on an audio signal. This paper explores ASC in low-resource conditions and proposes a novel model, DS-FlexiNet, which combines depthwise separable convolutions from MobileNetV2 with ResNet-inspired residual connections for a balance of efficiency and accuracy. To address hardware limitations and device heterogeneity, DS-FlexiNet employs Quantization Aware Training (QAT) for model compression and data augmentation methods like Auto Device Impulse Response (ADIR) and Freq-MixStyle (FMS) to improve cross-device generalization. Knowledge Distillation (KD) from twelve teacher models further enhances performance on unseen devices. The architecture includes a custom Residual Normalization layer to handle domain differences across devices, and depthwise separable convolutions reduce computational overhead without sacrificing feature representation. Experimental results show that DS-FlexiNet excels in both adaptability and performance under resource-constrained conditions.

Tasks

Acoustic Scene Classification Classification Data Augmentation Knowledge Distillation Model Compression Quantization Scene Classification

Improving Acoustic Scene Classification in Low-Resource Conditions

Abstract

Tasks

Reproductions