Speaker Recognition

Speaker Recognition is the process of identifying or confirming the identity of a person given his speech segments.

Source: Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 435 papers

Title	Date	Tasks	Status	Hype	Score
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit	May 20, 2022	AllAutomatic Speech Recognition (ASR)	CodeCode Available	6	5
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark	Jul 16, 2024	DiversitySpeaker Identification	CodeCode Available	5	5
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models	Jan 30, 2024	Self-Supervised LearningSpeaker Recognition	CodeCode Available	3	5
Pushing the limits of raw waveform speaker recognition	Mar 16, 2022	Self-Supervised LearningSpeaker Recognition	CodeCode Available	3	5
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition	Sep 21, 2023	Speaker Recognition	CodeCode Available	3	5
Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews	Oct 18, 2023	CPUGPU	CodeCode Available	3	5
SEED: Speaker Embedding Enhancement Diffusion Model	May 22, 2025	modelSpeaker Recognition	CodeCode Available	2	5
Reshape Dimensions Network for Speaker Recognition	Jul 25, 2024	Speaker Recognition	CodeCode Available	2	5
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking	Oct 11, 2018	Speaker RecognitionSpeaker Separation	CodeCode Available	2	5
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction	Sep 4, 2024	Speaker RecognitionSpeech Separation	CodeCode Available	1	5
Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition	Jun 7, 2022	Speaker Recognitionspeech-recognition	CodeCode Available	1	5
Toroidal Probabilistic Spherical Discriminant Analysis	Oct 27, 2022	FormSpeaker Recognition	CodeCode Available	1	5
Universal Adversarial Perturbations Generative Network for Speaker Recognition	Apr 7, 2020	Speaker Recognition	CodeCode Available	1	5
Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning	Oct 22, 2020	Representation LearningSpeaker Recognition	CodeCode Available	1	5
Training speaker recognition systems with limited data	Mar 28, 2022	Speaker Recognition	CodeCode Available	1	5
Utterance-level Aggregation For Speaker Recognition In The Wild	Feb 26, 2019	Speaker RecognitionText-Independent Speaker Verification	CodeCode Available	1	5
SpeechNAS: Towards Better Trade-off between Latency and Accuracy for Large-Scale Speaker Verification	Sep 18, 2021	Neural Architecture SearchSpeaker Recognition	CodeCode Available	1	5
Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models	Feb 25, 2020	Speaker IdentificationSpeaker Recognition	CodeCode Available	1	5
Speaker Recognition from Raw Waveform with SincNet	Jul 29, 2018	Speaker IdentificationSpeaker Recognition	CodeCode Available	1	5
Speaker Recognition in the Wild	May 5, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1	5
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?	Jun 14, 2023	Natural Language UnderstandingSelf-Supervised Learning	CodeCode Available	1	5
HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE	Nov 12, 2021	Domain AdaptationSpeaker Recognition	CodeCode Available	1	5
Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs	Apr 6, 2020	Meta-LearningSpeaker Identification	CodeCode Available	1	5
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech	Jul 12, 2020	Keyword SpottingSelf-Supervised Learning	CodeCode Available	1	5
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts	May 24, 2022	Face DetectionFace Generation	CodeCode Available	1	5
Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel’s Weekly Video Podcasts	Jun 1, 2022	Face DetectionFace Generation	CodeCode Available	1	5
Neural PLDA Modeling for End-to-End Speaker Verification	Aug 11, 2020	Speaker RecognitionSpeaker Verification	CodeCode Available	1	5
Speech and Speaker Recognition from Raw Waveform with SincNet	Dec 13, 2018	Inductive BiasSpeaker Recognition	CodeCode Available	1	5
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset	Jan 16, 2023	Audio-Visual Speech RecognitionLip Reading	CodeCode Available	1	5
NPLDA: A Deep Neural PLDA Model for Speaker Verification	Feb 10, 2020	Speaker RecognitionSpeaker Verification	CodeCode Available	1	5
Self-supervised Speaker Recognition with Loss-gated Learning	Oct 8, 2021	Self-Supervised LearningSpeaker Recognition	CodeCode Available	1	5
Adversarial Attack and Defense Strategies for Deep Speaker Recognition Systems	Aug 18, 2020	Adversarial AttackAdversarial Robustness	CodeCode Available	1	5
SEC4SR: A Security Analysis Platform for Speaker Recognition	Sep 4, 2021	Speaker Recognition	CodeCode Available	1	5
Speaker anonymisation using the McAdams coefficient	Nov 2, 2020	Speaker Recognition	CodeCode Available	1	5
Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings	Mar 28, 2022	Speaker Recognition	CodeCode Available	1	5
BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition	Jun 30, 2019	AvgRepresentation Learning	CodeCode Available	1	5
Bias in Automated Speaker Recognition	Jan 24, 2022	BIG-bench Machine LearningFace Recognition	CodeCode Available	1	5
Crossed-Time Delay Neural Network for Speaker Recognition	May 31, 2020	Speaker RecognitionSpeaker Verification	CodeCode Available	1	5
AM-MobileNet1D: A Portable Model for Speaker Recognition	Mar 31, 2020	Deep Learningmodel	CodeCode Available	1	5
Probabilistic Back-ends for Online Speaker Recognition and Clustering	Feb 19, 2023	ClusteringOnline Clustering	CodeCode Available	1	5
EfficientTDNN: Efficient Architecture Search for Speaker Recognition	Mar 25, 2021	Data AugmentationNetwork Pruning	CodeCode Available	1	5
Speaker recognition with two-step multi-modal deep cleansing	Oct 28, 2022	Representation LearningSpeaker Recognition	CodeCode Available	1	5
Exploring Deep Learning for Joint Audio-Visual Lip Biometrics	Apr 17, 2021	Deep LearningSpeaker Recognition	CodeCode Available	1	5
Fine-tuning wav2vec2 for speaker recognition	Sep 30, 2021	ClassificationSpeaker Recognition	CodeCode Available	1	5
Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model	Sep 12, 2018	Speaker RecognitionText-Independent Speaker Recognition	CodeCode Available	1	5
AutoSpeech: Neural Architecture Search for Speaker Recognition	May 7, 2020	image-classificationImage Classification	CodeCode Available	1	5
Deep Discriminative Feature Learning for Accent Recognition	Nov 25, 2020	Face RecognitionSpeaker Identification	CodeCode Available	1	5
Leveraging speaker attribute information using multi task learning for speaker verification and diarization	Oct 27, 2020	AttributeMulti-Task Learning	CodeCode Available	1	5
Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis	Oct 7, 2021	Speaker RecognitionSpeaker Verification	CodeCode Available	1	5
Speaker embeddings by modeling channel-wise correlations	Apr 6, 2021	Speaker RecognitionStyle Transfer	CodeCode Available	1	5

Show:10 25 50

← PrevPage 1 of 9Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	w2v2-aam	EER	1.88	—	Unverified
2	WavLM+ECAPA-TDNN	EER	0.39	—	Unverified