| Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends | Jan 2, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models | Sep 21, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Adversarially learning disentangled speech representations for robust multi-factor voice conversion | Jan 30, 2021 | Representation LearningRhythm | —Unverified | 0 |
| HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition | Jun 3, 2025 | Emotion RecognitionRepresentation Learning | —Unverified | 0 |
| Experiments on Turkish ASR with Self-Supervised Speech Representation Learning | Oct 13, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Application of Knowledge Distillation to Multi-task Speech Representation Learning | Oct 29, 2022 | Keyword SpottingKnowledge Distillation | —Unverified | 0 |
| Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement | Nov 12, 2022 | Data AugmentationEmotion Recognition | —Unverified | 0 |
| Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation | Jun 17, 2019 | ClusteringRepresentation Learning | —Unverified | 0 |
| Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers | Jun 9, 2020 | General ClassificationRepresentation Learning | —Unverified | 0 |
| General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework | Feb 3, 2021 | ClassificationEmotion Classification | —Unverified | 0 |