An hybrid CNN-Transformer model based on multi-feature extraction and attention fusion mechanism for cerebral emboli classification
Yamil Vindas Yassine, Blaise Kévin Guépié, Marilys Almar, Emmanuel Roux, Philippe Delachartre
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/yamilvindas/gdecpytorch★ 4
Abstract
When dealing with signal processing and deep learning for classification, the choice of inputting whether the raw signal or transforming it into a time-frequency representation (TFR) remains an open question. In this work, we propose a novel CNN-Transformer model based on multi-feature extraction and learnable representation attention weights per class to do classification with raw signals and TFRs. First, we start by extracting a TFR from the raw signal. Then, we train two models to extract intermediate representations from the raw signals and the TFRs. We use a CNN-Transformer model to process the raw signal and a 2D CNN for the TFR. Finally, we train a classifier that combines the outputs of both models (late fusion) using learnable and interpretable attention weights per class. We evaluate our approach on three medical datasets: a cerebral emboli dataset (HITS), and two electrocardiogram datasets, PTB and MIT-BIH, for heartbeat categorization. The results show that our multi-feature fusion approach improves the classification performance with respect to the use of a single feature method or other multi-feature fusion methods. Furthermore, it achieves state-of-the-art results on the HITS and PTB datasets with a classification accuracy of 93, 4% and 99, 7%, respectively. It also achieves excellent performance on the MIT-BIH dataset, with an accuracy of 98, 4% and a lighter model than other state-of-the-art methods. What is more, our fusion method provides interpretable attention weights per class indicating the importance of each representation for the final decision of the classifier.