Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 328 papers

Title	Date	Tasks	Status
Speaker Diarization With Lexical Information	Nov 27, 2018	Clusteringspeaker-diarization	—Unverified
Speaker Diarization with Lexical Information	Apr 13, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Speaker diarization with session-level speaker embedding refinement using graph neural networks	May 22, 2020	Clusteringspeaker-diarization	—Unverified
Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios	Mar 18, 2022	Action DetectionActivity Detection	—Unverified
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization	May 15, 2024	Action DetectionActivity Detection	—Unverified
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition	Dec 18, 2023	speaker-diarizationSpeaker Diarization	—Unverified
Speaker Recognition Based on Deep Learning: An Overview	Dec 2, 2020	Deep LearningDomain Adaptation	—Unverified
Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization	Jun 26, 2024	ClusteringForm	—Unverified
Speaker Tagging Correction With Non-Autoregressive Language Models	Aug 30, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context	May 1, 2016	speaker-diarizationSpeaker Diarization	—Unverified
Summary of the DISPLACE Challenge 2023 -- DIarization of SPeaker and LAnguage in Conversational Environments	Nov 21, 2023	speaker-diarizationSpeaker Diarization	—Unverified
Systematic Evaluation of Online Speaker Diarization Systems Regarding their Latency	Jul 5, 2024	Online ClusteringSegmentation	—Unverified
System Description for the Displace Speaker Diarization Challenge 2023	Jun 20, 2024	Clusteringspeaker-diarization	—Unverified
Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system	Mar 9, 2020	Allspeaker-diarization	—Unverified
TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024	Jul 17, 2024	speaker-diarizationSpeaker Diarization	—Unverified
Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario	May 14, 2020	Action DetectionActivity Detection	—Unverified
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction	Oct 28, 2022	Action DetectionActivity Detection	—Unverified
Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker	Aug 7, 2021	Action DetectionActivity Detection	—Unverified
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization	Aug 27, 2022	Action DetectionActivity Detection	—Unverified
Target Speech Diarization with Multimodal Prompts	Jun 11, 2024	speaker-diarizationSpeaker Diarization	—Unverified
TCG CREST System Description for the Second DISPLACE Challenge	Sep 16, 2024	Action DetectionActivity Detection	—Unverified
The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences	Jun 14, 2024	Depth EstimationImage Segmentation	—Unverified
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System	Oct 18, 2023	Automatic Speech Recognitionspeaker-diarization	—Unverified
The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge	Feb 4, 2022	Action DetectionActivity Detection	—Unverified
The DKU-DukeECE-Lenovo System for the Diarization Task of the 2021 VoxCeleb Speaker Recognition Challenge	Sep 5, 2021	Action DetectionActivity Detection	—Unverified

Show:10 25 50

← PrevPage 8 of 14Next →

All datasets CALLHOME NIST-SRE 2000 AMI Lapel AMI MixHeadset CH109 DIHARD ETAPE AMI CALLHOME-109 AliMeeting DIHARD II Hub5'00 CallHome

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	COS+NJW-SC (Oracle SAD)	DER(%)	24.05	—	Unverified
2	EEND	DER(%)	23.07	—	Unverified
3	COS+AHC (Oracle SAD)	DER(%)	21.13	—	Unverified
4	SA-EEND (2-spk, no-adapt)	DER(%)	12.66	—	Unverified
5	EEND-OLA	DER(%)	12.57	—	Unverified
6	SA-EEND (2-spk, adapted)	DER(%)	10.76	—	Unverified
7	TOLD	DER(%)	10.14	—	Unverified
8	COS+B-SC (Oracle SAD)	DER(ig olp)	8.78	—	Unverified
9	PLDA+AHC (Oracle SAD)	DER(ig olp)	8.39	—	Unverified
10	COS+NME-SC (Oracle SAD)	DER(ig olp)	7.29	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	8.39	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	6.73	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	6.47	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	6.37	—	Unverified
5	x-vector (MCGAN)	DER(%)	5.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ECAPA (SC)	DER(%)	2.36	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	2.03	—	Unverified
3	TitaNet-S (NME-SC)	DER(%)	2	—	Unverified
4	TitaNet-M (NME-SC)	DER(%)	1.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TitaNet-S (NME-SC)	DER(%)	2.22	—	Unverified
2	TitaNet-M (NME-SC)	DER(%)	1.79	—	Unverified
3	ECAPA (SC)	DER(%)	1.78	—	Unverified
4	TitaNet-L (NME-SC)	DER(%)	1.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	9.72	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	1.19	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	1.13	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline (the best result in the literature as of Oct.2019)	DER(%)	11.2	—	Unverified
2	pyannote (MFCC)	DER(%)	10.5	—	Unverified
3	pyannote (waveform)	DER(%)	9.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline	DER(%)	7.7	—	Unverified
2	pyannote (MFCC)	DER(%)	5.6	—	Unverified
3	pyannote (waveform)	DER(%)	4.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	pyannote (MFCC)	DER(%)	6.3	—	Unverified
2	pyannote (waveform)	DER(%)	6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	d-vector + spectral	DER(%)	12.54	—	Unverified
2	titanet-s	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SOND	DER(%)	4.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN-SML	DER(%)	27.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN	V	10.6	—	Unverified