Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 328 papers

Title	Date	Tasks	Status
Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries	Mar 29, 2022	speaker-diarizationSpeaker Diarization	—Unverified
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR	Oct 7, 2021	Action DetectionActivity Detection	—Unverified
Triplet Network with Attention for Speaker Diarization	Aug 4, 2018	Metric Learningspeaker-diarization	—Unverified
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge	Oct 26, 2022	Action DetectionActivity Detection	—Unverified
Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification	Dec 28, 2023	speaker-diarizationSpeaker Diarization	—Unverified
Unified Audio Event Detection	Sep 13, 2024	Event DetectionSound Event Detection	—Unverified
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection	Jan 7, 2025	Action DetectionActivity Detection	—Unverified
UniX-Encoder: A Universal X-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing	Oct 25, 2023	speaker-diarizationSpeaker Diarization	—Unverified
Unsupervised Adaptation of SPLDA	Nov 20, 2015	speaker-diarizationSpeaker Diarization	—Unverified
Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning	Apr 16, 2024	Change DetectionFederated Learning	—Unverified
Unsupervised Speaker Diarization that is Agnostic to Language, Overlap-Aware, and Tuning Free	Jul 25, 2022	speaker-diarizationSpeaker Diarization	—Unverified
Using Active Speaker Faces for Diarization in TV shows	Mar 30, 2022	Face ClusteringFace Detection	—Unverified
Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones	Jul 31, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge	May 27, 2019	Clusteringspeaker-diarization	—Unverified
VOXLINGUA107: A DATASET FOR SPOKEN LANGUAGE RECOGNITION	Nov 25, 2020	Action DetectionActivity Detection	—Unverified
VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering	May 22, 2025	Question AnsweringRAG	—Unverified
Weakly Supervised Training of Speaker Identification Models	Jun 22, 2018	speaker-diarizationSpeaker Diarization	—Unverified
An approach to optimize inference of the DIART speaker diarization pipeline	Aug 5, 2024	Inference OptimizationKnowledge Distillation	—Unverified
X-Vectors with Multi-Scale Aggregation for Speaker Diarization	May 16, 2021	speaker-diarizationSpeaker Diarization	—Unverified
3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization	Mar 29, 2024	Self-Supervised Learningspeaker-diarization	—Unverified
A Benchmark for Multi-speaker Anonymization	Jul 8, 2024	BenchmarkingDisentanglement	—Unverified
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio	Jul 6, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings	Nov 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Advances in Online Audio-Visual Meeting Transcription	Dec 10, 2019	Sound Source Localizationspeaker-diarization	—Unverified
A framework for the automatic inference of stochastic turn-taking styles	Sep 1, 2016	Speaker Diarization	—Unverified

Show:10 25 50

← PrevPage 10 of 14Next →

All datasets CALLHOME NIST-SRE 2000 AMI Lapel AMI MixHeadset CH109 DIHARD ETAPE AMI CALLHOME-109 AliMeeting DIHARD II Hub5'00 CallHome

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	COS+NJW-SC (Oracle SAD)	DER(%)	24.05	—	Unverified
2	EEND	DER(%)	23.07	—	Unverified
3	COS+AHC (Oracle SAD)	DER(%)	21.13	—	Unverified
4	SA-EEND (2-spk, no-adapt)	DER(%)	12.66	—	Unverified
5	EEND-OLA	DER(%)	12.57	—	Unverified
6	SA-EEND (2-spk, adapted)	DER(%)	10.76	—	Unverified
7	TOLD	DER(%)	10.14	—	Unverified
8	COS+B-SC (Oracle SAD)	DER(ig olp)	8.78	—	Unverified
9	PLDA+AHC (Oracle SAD)	DER(ig olp)	8.39	—	Unverified
10	COS+NME-SC (Oracle SAD)	DER(ig olp)	7.29	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	8.39	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	6.73	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	6.47	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	6.37	—	Unverified
5	x-vector (MCGAN)	DER(%)	5.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ECAPA (SC)	DER(%)	2.36	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	2.03	—	Unverified
3	TitaNet-S (NME-SC)	DER(%)	2	—	Unverified
4	TitaNet-M (NME-SC)	DER(%)	1.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TitaNet-S (NME-SC)	DER(%)	2.22	—	Unverified
2	TitaNet-M (NME-SC)	DER(%)	1.79	—	Unverified
3	ECAPA (SC)	DER(%)	1.78	—	Unverified
4	TitaNet-L (NME-SC)	DER(%)	1.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	9.72	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	1.19	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	1.13	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline (the best result in the literature as of Oct.2019)	DER(%)	11.2	—	Unverified
2	pyannote (MFCC)	DER(%)	10.5	—	Unverified
3	pyannote (waveform)	DER(%)	9.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline	DER(%)	7.7	—	Unverified
2	pyannote (MFCC)	DER(%)	5.6	—	Unverified
3	pyannote (waveform)	DER(%)	4.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	pyannote (MFCC)	DER(%)	6.3	—	Unverified
2	pyannote (waveform)	DER(%)	6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	d-vector + spectral	DER(%)	12.54	—	Unverified
2	titanet-s	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SOND	DER(%)	4.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN-SML	DER(%)	27.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN	V	10.6	—	Unverified