Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 328 papers

Title	Date	Tasks	Status
Unsupervised Speaker Diarization that is Agnostic to Language, Overlap-Aware, and Tuning Free	Jul 25, 2022	speaker-diarizationSpeaker Diarization	—Unverified
Using Active Speaker Faces for Diarization in TV shows	Mar 30, 2022	Face ClusteringFace Detection	—Unverified
Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones	Jul 31, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge	May 27, 2019	Clusteringspeaker-diarization	—Unverified
VOXLINGUA107: A DATASET FOR SPOKEN LANGUAGE RECOGNITION	Nov 25, 2020	Action DetectionActivity Detection	—Unverified
VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering	May 22, 2025	Question AnsweringRAG	—Unverified
Weakly Supervised Training of Speaker Identification Models	Jun 22, 2018	speaker-diarizationSpeaker Diarization	—Unverified
An approach to optimize inference of the DIART speaker diarization pipeline	Aug 5, 2024	Inference OptimizationKnowledge Distillation	—Unverified
SCDiar: a streaming diarization system based on speaker change detection and speech recognition	Jan 28, 2025	Change Detectionspeaker-diarization	—Unverified
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition	Jun 15, 2025	Decoderspeaker-diarization	—Unverified
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models	Jan 14, 2025	speaker-diarizationSpeaker Diarization	—Unverified
Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models	Jun 16, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Segmentation et Regroupement en Locuteurs d'une collection de documents audio (Cross-show speaker diarization) [in French]	Jun 1, 2012	speaker-diarizationSpeaker Diarization	—Unverified
Self-supervised learning for audio-visual speaker diarization	Feb 13, 2020	Self-Supervised Learningspeaker-diarization	—Unverified
Self-supervised Speaker Diarization	Apr 8, 2022	speaker-diarizationSpeaker Diarization	—Unverified
Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech	Apr 8, 2020	Acoustic ModellingAction Detection	—Unverified
Semi-supervised Acoustic Modelling for Five-lingual Code-switched ASR using Automatically-segmented Soap Opera Speech	May 1, 2020	Acoustic ModellingAction Detection	—Unverified
Semi-supervised acoustic model training for speech with code-switching	Oct 23, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Semi-supervised multi-channel speaker diarization with cross-channel attention	Jul 17, 2023	speaker-diarizationSpeaker Diarization	—Unverified
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors	Mar 20, 2025	speaker-diarizationSpeaker Diarization	—Unverified
Separation Guided Speaker Diarization in Realistic Mismatched Conditions	Jul 6, 2021	Clusteringspeaker-diarization	—Unverified
Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation	Nov 21, 2024	Action DetectionActivity Detection	—Unverified
Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios	Jun 17, 2022	Action DetectionActivity Detection	—Unverified
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models	Sep 17, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens	Sep 10, 2024	speaker-diarizationSpeaker Diarization	—Unverified

Show:10 25 50

← PrevPage 10 of 14Next →

All datasets CALLHOME NIST-SRE 2000 AMI Lapel AMI MixHeadset CH109 DIHARD ETAPE AMI CALLHOME-109 AliMeeting DIHARD II Hub5'00 CallHome

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	COS+NJW-SC (Oracle SAD)	DER(%)	24.05	—	Unverified
2	EEND	DER(%)	23.07	—	Unverified
3	COS+AHC (Oracle SAD)	DER(%)	21.13	—	Unverified
4	SA-EEND (2-spk, no-adapt)	DER(%)	12.66	—	Unverified
5	EEND-OLA	DER(%)	12.57	—	Unverified
6	SA-EEND (2-spk, adapted)	DER(%)	10.76	—	Unverified
7	TOLD	DER(%)	10.14	—	Unverified
8	COS+B-SC (Oracle SAD)	DER(ig olp)	8.78	—	Unverified
9	PLDA+AHC (Oracle SAD)	DER(ig olp)	8.39	—	Unverified
10	COS+NME-SC (Oracle SAD)	DER(ig olp)	7.29	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	8.39	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	6.73	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	6.47	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	6.37	—	Unverified
5	x-vector (MCGAN)	DER(%)	5.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ECAPA (SC)	DER(%)	2.36	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	2.03	—	Unverified
3	TitaNet-S (NME-SC)	DER(%)	2	—	Unverified
4	TitaNet-M (NME-SC)	DER(%)	1.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TitaNet-S (NME-SC)	DER(%)	2.22	—	Unverified
2	TitaNet-M (NME-SC)	DER(%)	1.79	—	Unverified
3	ECAPA (SC)	DER(%)	1.78	—	Unverified
4	TitaNet-L (NME-SC)	DER(%)	1.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	9.72	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	1.19	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	1.13	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline (the best result in the literature as of Oct.2019)	DER(%)	11.2	—	Unverified
2	pyannote (MFCC)	DER(%)	10.5	—	Unverified
3	pyannote (waveform)	DER(%)	9.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline	DER(%)	7.7	—	Unverified
2	pyannote (MFCC)	DER(%)	5.6	—	Unverified
3	pyannote (waveform)	DER(%)	4.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	pyannote (MFCC)	DER(%)	6.3	—	Unverified
2	pyannote (waveform)	DER(%)	6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	d-vector + spectral	DER(%)	12.54	—	Unverified
2	titanet-s	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SOND	DER(%)	4.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN-SML	DER(%)	27.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN	V	10.6	—	Unverified