| HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition | Jun 3, 2025 | Emotion RecognitionRepresentation Learning | —Unverified | 0 |
| DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation | May 26, 2025 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Universal Semantic Disentangled Privacy-preserving Speech Representation Learning | May 19, 2025 | DecoderPrivacy Preserving | —Unverified | 0 |
| UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation | Mar 2, 2025 | DecoderRepresentation Learning | —Unverified | 0 |
| Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation | Jan 23, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning | Oct 17, 2024 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 |
| JOOCI: a Framework for Learning Comprehensive Speech Representations | Oct 14, 2024 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models | Sep 21, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT | Sep 16, 2024 | Acoustic Unit DiscoveryClustering | CodeCode Available | 1 |