Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 328 papers

Title	Date	Tasks	Status
The USTC-NERCSLIP Systems for The ICMC-ASR Challenge	Jul 2, 2024	Automatic Speech RecognitionPseudo Label	—Unverified
The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge	Feb 10, 2022	Action DetectionActivity Detection	—Unverified
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge	Feb 9, 2022	Data AugmentationLanguage Modelling	—Unverified
The xmuspeech system for multi-channel multi-party meeting transcription challenge	Feb 11, 2022	speaker-diarizationSpeaker Diarization	—Unverified
Third DIHARD Challenge Evaluation Plan	Oct 30, 2020	speaker-diarizationSpeaker Diarization	—Unverified
"This is Houston. Say again, please". The Behavox system for the Apollo-11 Fearless Steps Challenge (phase II)	Aug 4, 2020	Action DetectionActivity Detection	—Unverified
Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network	Apr 7, 2021	Binary Classificationspeaker-diarization	—Unverified
Tight integration of neural- and clustering-based diarization through deep unfolding of infinite Gaussian mixture model	Feb 14, 2022	Clusteringspeaker-diarization	—Unverified
Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams	Jul 12, 2019	Clusteringspeaker-diarization	—Unverified
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch	Dec 11, 2024	Denoisingspeaker-diarization	—Unverified
Towards end-2-end learning for predicting behavior codes from spoken utterances in psychotherapy conversations	Jul 1, 2020	Action DetectionActivity Detection	—Unverified
Late Audio-Visual Fusion for In-The-Wild Speaker Diarization	Nov 2, 2022	speaker-diarizationSpeaker Diarization	—Unverified
Towards Measuring and Scoring Speaker Diarization Fairness	Feb 20, 2023	FairnessSentence	—Unverified
Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio	May 21, 2023	speaker-diarizationSpeaker Diarization	—Unverified
Towards Unsupervised Speaker Diarization System for Multilingual Telephone Calls Using Pre-trained Whisper Model and Mixture of Sparse Autoencoders	Jul 2, 2024	Clusteringspeaker-diarization	—Unverified
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network	Sep 15, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries	Mar 29, 2022	speaker-diarizationSpeaker Diarization	—Unverified
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR	Oct 7, 2021	Action DetectionActivity Detection	—Unverified
Triplet Network with Attention for Speaker Diarization	Aug 4, 2018	Metric Learningspeaker-diarization	—Unverified
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge	Oct 26, 2022	Action DetectionActivity Detection	—Unverified
Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification	Dec 28, 2023	speaker-diarizationSpeaker Diarization	—Unverified
Unified Audio Event Detection	Sep 13, 2024	Event DetectionSound Event Detection	—Unverified
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection	Jan 7, 2025	Action DetectionActivity Detection	—Unverified
UniX-Encoder: A Universal X-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing	Oct 25, 2023	speaker-diarizationSpeaker Diarization	—Unverified
Unsupervised Adaptation of SPLDA	Nov 20, 2015	speaker-diarizationSpeaker Diarization	—Unverified
Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning	Apr 16, 2024	Change DetectionFederated Learning	—Unverified
Unsupervised Speaker Diarization that is Agnostic to Language, Overlap-Aware, and Tuning Free	Jul 25, 2022	speaker-diarizationSpeaker Diarization	—Unverified
Using Active Speaker Faces for Diarization in TV shows	Mar 30, 2022	Face ClusteringFace Detection	—Unverified
Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones	Jul 31, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge	May 27, 2019	Clusteringspeaker-diarization	—Unverified
VOXLINGUA107: A DATASET FOR SPOKEN LANGUAGE RECOGNITION	Nov 25, 2020	Action DetectionActivity Detection	—Unverified
VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering	May 22, 2025	Question AnsweringRAG	—Unverified
Weakly Supervised Training of Speaker Identification Models	Jun 22, 2018	speaker-diarizationSpeaker Diarization	—Unverified
An approach to optimize inference of the DIART speaker diarization pipeline	Aug 5, 2024	Inference OptimizationKnowledge Distillation	—Unverified
X-Vectors with Multi-Scale Aggregation for Speaker Diarization	May 16, 2021	speaker-diarizationSpeaker Diarization	—Unverified
A Benchmark for Multi-speaker Anonymization	Jul 8, 2024	BenchmarkingDisentanglement	—Unverified
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio	Jul 6, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings	Nov 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Advances in Online Audio-Visual Meeting Transcription	Dec 10, 2019	Sound Source Localizationspeaker-diarization	—Unverified
A framework for the automatic inference of stochastic turn-taking styles	Sep 1, 2016	Speaker Diarization	—Unverified
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond	Feb 6, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
AG-LSEC: Audio Grounded Lexical Speaker Error Correction	Jun 25, 2024	Language ModelingLanguage Modelling	—Unverified
Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)	Sep 14, 2023	Multiple Sequence Alignmentspeaker-diarization	—Unverified
All-neural online source separation, counting, and diarization for meeting analysis	Feb 21, 2019	AllAutomatic Speech Recognition	—Unverified
An Alternative to Low-level-Sychrony-Based Methods for Speech Detection	Dec 1, 2010	Facial Expression RecognitionFacial Expression Recognition (FER)	—Unverified
An automated medical scribe for documenting clinical encounters	Jun 1, 2018	speaker-diarizationSpeaker Diarization	—Unverified
An Effortless Way To Create Large-Scale Datasets For Famous Speakers	May 1, 2014	Person IdentificationSpeaker Diarization	—Unverified
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings	May 29, 2023	Clusteringspeaker-diarization	—Unverified
An Infinite Hidden Markov Model With Similarity-Biased Transitions	Jul 21, 2017	speaker-diarizationSpeaker Diarization	—Unverified
基於i-vector與PLDA並使用GMM-HMM強制對位之自動語者分段標記系統 (Speaker Diarization based on I-vector PLDA Scoring and using GMM-HMM Forced Alignment) [In Chinese]	Nov 1, 2017	speaker-diarizationSpeaker Diarization	—Unverified

Show:10 25 50

← PrevPage 5 of 7Next →

All datasets CALLHOME NIST-SRE 2000 AMI Lapel AMI MixHeadset CH109 DIHARD ETAPE AMI CALLHOME-109 AliMeeting DIHARD II Hub5'00 CallHome

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	COS+NJW-SC (Oracle SAD)	DER(%)	24.05	—	Unverified
2	EEND	DER(%)	23.07	—	Unverified
3	COS+AHC (Oracle SAD)	DER(%)	21.13	—	Unverified
4	SA-EEND (2-spk, no-adapt)	DER(%)	12.66	—	Unverified
5	EEND-OLA	DER(%)	12.57	—	Unverified
6	SA-EEND (2-spk, adapted)	DER(%)	10.76	—	Unverified
7	TOLD	DER(%)	10.14	—	Unverified
8	COS+B-SC (Oracle SAD)	DER(ig olp)	8.78	—	Unverified
9	PLDA+AHC (Oracle SAD)	DER(ig olp)	8.39	—	Unverified
10	COS+NME-SC (Oracle SAD)	DER(ig olp)	7.29	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	8.39	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	6.73	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	6.47	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	6.37	—	Unverified
5	x-vector (MCGAN)	DER(%)	5.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ECAPA (SC)	DER(%)	2.36	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	2.03	—	Unverified
3	TitaNet-S (NME-SC)	DER(%)	2	—	Unverified
4	TitaNet-M (NME-SC)	DER(%)	1.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TitaNet-S (NME-SC)	DER(%)	2.22	—	Unverified
2	TitaNet-M (NME-SC)	DER(%)	1.79	—	Unverified
3	ECAPA (SC)	DER(%)	1.78	—	Unverified
4	TitaNet-L (NME-SC)	DER(%)	1.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	9.72	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	1.19	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	1.13	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline (the best result in the literature as of Oct.2019)	DER(%)	11.2	—	Unverified
2	pyannote (MFCC)	DER(%)	10.5	—	Unverified
3	pyannote (waveform)	DER(%)	9.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline	DER(%)	7.7	—	Unverified
2	pyannote (MFCC)	DER(%)	5.6	—	Unverified
3	pyannote (waveform)	DER(%)	4.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	pyannote (MFCC)	DER(%)	6.3	—	Unverified
2	pyannote (waveform)	DER(%)	6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	d-vector + spectral	DER(%)	12.54	—	Unverified
2	titanet-s	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SOND	DER(%)	4.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN-SML	DER(%)	27.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN	V	10.6	—	Unverified