| HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition | Jun 3, 2025 | Emotion RecognitionRepresentation Learning | —Unverified | 0 |
| DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation | May 26, 2025 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Universal Semantic Disentangled Privacy-preserving Speech Representation Learning | May 19, 2025 | DecoderPrivacy Preserving | —Unverified | 0 |
| UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation | Mar 2, 2025 | DecoderRepresentation Learning | —Unverified | 0 |
| Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation | Jan 23, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning | Oct 17, 2024 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 |
| JOOCI: a Framework for Learning Comprehensive Speech Representations | Oct 14, 2024 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models | Sep 21, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT | Sep 16, 2024 | Acoustic Unit DiscoveryClustering | CodeCode Available | 1 |
| Progressive Residual Extraction based Pre-training for Speech Representation Learning | Aug 31, 2024 | Emotion RecognitionRepresentation Learning | —Unverified | 0 |
| Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation | Aug 20, 2024 | Data AugmentationRepresentation Learning | —Unverified | 0 |
| Towards the Next Frontier in Speech Representation Learning Using Disentanglement | Jul 2, 2024 | DisentanglementRepresentation Learning | —Unverified | 0 |
| Towards Robust Speech Representation Learning for Thousands of Languages | Jun 30, 2024 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge | Jun 10, 2024 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| mHuBERT-147: A Compact Multilingual HuBERT Model | Jun 10, 2024 | Automatic Speech Recognition (ASR)Diversity | CodeCode Available | 0 |
| XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception | Mar 21, 2024 | Audio-Visual Speech RecognitionRepresentation Learning | —Unverified | 0 |
| An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning | Mar 13, 2024 | DenoisingKnowledge Distillation | CodeCode Available | 0 |
| The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning | Feb 21, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization | Jan 26, 2024 | DecoderDomain Adaptation | —Unverified | 0 |
| Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective | Jan 16, 2024 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| Efficiency-oriented approaches for self-supervised speech representation learning | Dec 18, 2023 | Automatic Speech RecognitionRepresentation Learning | —Unverified | 0 |
| Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion | Nov 14, 2023 | Deep LearningDiversity | —Unverified | 0 |
| Learning Disentangled Speech Representations | Nov 4, 2023 | BenchmarkingDisentanglement | —Unverified | 0 |
| Privacy-preserving Representation Learning for Speech Understanding | Oct 26, 2023 | ClassificationEmotion Recognition | —Unverified | 0 |
| CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition | Oct 18, 2023 | Audio ClassificationContrastive Learning | CodeCode Available | 1 |
| MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning | Oct 17, 2023 | DisentanglementRepresentation Learning | CodeCode Available | 0 |
| Spatial HuBERT: Self-supervised Spatial Speech Representation Learning for a Single Talker from Multi-channel Audio | Oct 17, 2023 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| Evaluating Self-Supervised Speech Representations for Indigenous American Languages | Oct 5, 2023 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning | Sep 25, 2023 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 |
| QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning | Aug 31, 2023 | Representation LearningSpeech Representation Learning | CodeCode Available | 1 |
| Speech representation learning: Learning bidirectional encoders with single-view, multi-view, and multi-task methods | Jul 25, 2023 | MULTI-VIEW LEARNINGRepresentation Learning | —Unverified | 0 |
| MASR: Multi-label Aware Speech Representation | Jul 20, 2023 | Emotion RecognitionLanguage Identification | —Unverified | 0 |
| On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation | Jul 6, 2023 | Keyword SpottingKnowledge Distillation | —Unverified | 0 |
| Flowchase: a Mobile Application for Pronunciation Training | Jul 5, 2023 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Label Aware Speech Representation Learning For Language Identification | Jun 7, 2023 | Language IdentificationMissing Labels | —Unverified | 0 |
| Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System | Jun 5, 2023 | Multi-Task LearningRepresentation Learning | —Unverified | 0 |
| An empirical study on speech restoration guided by self supervised speech representation | May 30, 2023 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition | May 25, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition | May 23, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning | May 17, 2023 | ClusteringLanguage Modeling | CodeCode Available | 1 |
| A multimodal dynamical variational autoencoder for audiovisual speech representation learning | May 5, 2023 | DenoisingDisentanglement | CodeCode Available | 0 |
| Learning Cross-lingual Visual Speech Representations | Mar 14, 2023 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Mar 9, 2023 | 3D Face AnimationRepresentation Learning | CodeCode Available | 1 |
| Self-supervised speech representation learning for keyword-spotting with light-weight transformers | Mar 7, 2023 | Keyword SpottingRepresentation Learning | —Unverified | 0 |
| Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding | Feb 27, 2023 | Model CompressionRepresentation Learning | CodeCode Available | 1 |
| A low latency attention module for streaming self-supervised speech representation learning | Feb 27, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Efficient Speech Representation Learning with Low-Bit Quantization | Dec 14, 2022 | Model CompressionQuantization | —Unverified | 0 |
| Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information | Dec 7, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Disentangled Feature Learning for Real-Time Neural Speech Coding | Nov 22, 2022 | DisentanglementRepresentation Learning | —Unverified | 0 |