SOTAVerified

Speech Representation Learning

Papers

Showing 150 of 131 papers

TitleStatusHype
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-SpeechCode6
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-TrainingCode3
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionCode2
Robust Self-Supervised Audio-Visual Speech RecognitionCode2
Fast Development of ASR in African Languages using Self Supervised Speech Representation LearningCode1
EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation LearningCode1
CLARA: Multilingual Contrastive Learning for Audio Representation AcquisitionCode1
TranSpeech: Speech-to-Speech Translation With Bilateral PerturbationCode1
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at ScaleCode1
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-TrainingCode1
Unsupervised speech representation learning using WaveNet autoencodersCode1
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled DataCode1
Supervised Speech Representation Learning for Parkinson's Disease ClassificationCode1
Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and UnderstandingCode1
Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice ConversionCode1
SLICER: Learning universal audio representations using low-resource self-supervised pre-trainingCode1
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation LearningCode1
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERTCode1
The Efficacy of Self-Supervised Speech Models for Audio RepresentationsCode1
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation LearningCode1
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation LearningCode1
Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate UsersCode1
data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setupCode1
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden UnitsCode1
A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and EditingCode1
An Unsupervised Autoregressive Model for Speech Representation LearningCode1
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation LearningCode1
FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation LearningCode1
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERTCode1
Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive LearningCode1
Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation LearningCode1
DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector QuantizationCode1
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple TargetsCode1
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech RepresentationCode1
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE0
Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective0
A Comparison of Discrete Latent Variable Models for Speech Representation Learning0
Disentangled Feature Learning for Real-Time Neural Speech Coding0
ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems0
A Brief Overview of Unsupervised Neural Speech Representation Learning0
Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends0
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models0
Adversarially learning disentangled speech representations for robust multi-factor voice conversion0
HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition0
Experiments on Turkish ASR with Self-Supervised Speech Representation Learning0
Application of Knowledge Distillation to Multi-task Speech Representation Learning0
Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement0
Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation0
Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers0
General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.