Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–328 of 328 papers

Title	Date	Tasks	Status	Hype
Triplet Network with Attention for Speaker Diarization	Aug 4, 2018	Metric Learningspeaker-diarization	—Unverified	0
Indigenous language technologies in Canada: Assessment, challenges, and successes	Aug 1, 2018	Machine TranslationOptical Character Recognition	—Unverified	0
Weakly Supervised Training of Speaker Identification Models	Jun 22, 2018	speaker-diarizationSpeaker Diarization	—Unverified	0
An automated medical scribe for documenting clinical encounters	Jun 1, 2018	speaker-diarizationSpeaker Diarization	—Unverified	0
Role-specific Language Models for Processing Recorded Neuropsychological Exams	Jun 1, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Multimodal Speaker Segmentation and Diarization using Lexical and Acoustic Cues via Sequence to Sequence Neural Networks	May 28, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Computer-assisted Speaker Diarization: How to Evaluate Human Corrections	May 1, 2018	Active LearningFace Recognition	—Unverified	0
Matics Software Suite: New Tools for Evaluation and Data Exploration	May 1, 2018	Optical Character Recognition (OCR)Speaker Diarization	—Unverified	0
基於i-vector與PLDA並使用GMM-HMM強制對位之自動語者分段標記系統 (Speaker Diarization based on I-vector PLDA Scoring and using GMM-HMM Forced Alignment) [In Chinese]	Nov 1, 2017	speaker-diarizationSpeaker Diarization	—Unverified	0
Speaker Diarization with LSTM	Oct 28, 2017	Clusteringspeaker-diarization	CodeCode Available	1
Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings	Aug 9, 2017	speaker-diarizationSpeaker Diarization	—Unverified	0
An Infinite Hidden Markov Model With Similarity-Biased Transitions	Jul 21, 2017	speaker-diarizationSpeaker Diarization	—Unverified	0
Polish Read Speech Corpus for Speech Tools and Services	Jun 1, 2017	Action DetectionActivity Detection	—Unverified	0
A framework for the automatic inference of stochastic turn-taking styles	Sep 1, 2016	Speaker Diarization	—Unverified	0
Autoapprentissage pour le regroupement en locuteurs : premi\`eres investigations (First investigations on self trained speaker diarization )	Jul 1, 2016	Domain Adaptationspeaker-diarization	—Unverified	0
Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context	May 1, 2016	speaker-diarizationSpeaker Diarization	—Unverified	0
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion	Mar 31, 2016	Clusteringspeaker-diarization	—Unverified	0
Scalable Adaptation of State Complexity for Nonparametric Hidden Markov Models	Dec 1, 2015	speaker-diarizationSpeaker Diarization	CodeCode Available	0
Unsupervised Adaptation of SPLDA	Nov 20, 2015	speaker-diarizationSpeaker Diarization	—Unverified	0
New bilingual speech databases for audio diarization	May 1, 2014	speaker-diarizationSpeaker Diarization	—Unverified	0
An Effortless Way To Create Large-Scale Datasets For Famous Speakers	May 1, 2014	Person IdentificationSpeaker Diarization	—Unverified	0
The ETAPE speech processing evaluation	May 1, 2014	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Multi-modal Sensing and Analysis of Poster Conversations: Toward Smart Posterboard	Jul 1, 2012	Speaker DiarizationSpoken Dialogue Systems	—Unverified	0
Segmentation et Regroupement en Locuteurs d'une collection de documents audio (Cross-show speaker diarization) [in French]	Jun 1, 2012	speaker-diarizationSpeaker Diarization	—Unverified	0
Percol0 - un syst\`eme multimodal de d\'etection de personnes dans des documents vid\'eo (Percol0 - A multimodal person detection system in video documents) [in French]	Jun 1, 2012	Face DetectionHuman Detection	—Unverified	0
Nouvelle approche pour le regroupement des locuteurs dans des \'emissions radiophoniques et t\'el\'evisuelles (New approach for speaker clustering of broadcast news) [in French]	Jun 1, 2012	ClusteringSpeaker Diarization	—Unverified	0
An Alternative to Low-level-Sychrony-Based Methods for Speech Detection	Dec 1, 2010	Facial Expression RecognitionFacial Expression Recognition (FER)	—Unverified	0
A sticky HDP-HMM with application to speaker diarization	May 15, 2009	speaker-diarizationSpeaker Diarization	—Unverified	0

Show:10 25 50

← PrevPage 7 of 7Next →

All datasets CALLHOME NIST-SRE 2000 AMI Lapel AMI MixHeadset CH109 DIHARD ETAPE AMI CALLHOME-109 AliMeeting DIHARD II Hub5'00 CallHome

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	COS+NJW-SC (Oracle SAD)	DER(%)	24.05	—	Unverified
2	EEND	DER(%)	23.07	—	Unverified
3	COS+AHC (Oracle SAD)	DER(%)	21.13	—	Unverified
4	SA-EEND (2-spk, no-adapt)	DER(%)	12.66	—	Unverified
5	EEND-OLA	DER(%)	12.57	—	Unverified
6	SA-EEND (2-spk, adapted)	DER(%)	10.76	—	Unverified
7	TOLD	DER(%)	10.14	—	Unverified
8	COS+B-SC (Oracle SAD)	DER(ig olp)	8.78	—	Unverified
9	PLDA+AHC (Oracle SAD)	DER(ig olp)	8.39	—	Unverified
10	COS+NME-SC (Oracle SAD)	DER(ig olp)	7.29	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	8.39	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	6.73	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	6.47	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	6.37	—	Unverified
5	x-vector (MCGAN)	DER(%)	5.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ECAPA (SC)	DER(%)	2.36	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	2.03	—	Unverified
3	TitaNet-S (NME-SC)	DER(%)	2	—	Unverified
4	TitaNet-M (NME-SC)	DER(%)	1.99	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	TitaNet-S (NME-SC)	DER(%)	2.22	—	Unverified
2	TitaNet-M (NME-SC)	DER(%)	1.79	—	Unverified
3	ECAPA (SC)	DER(%)	1.78	—	Unverified
4	TitaNet-L (NME-SC)	DER(%)	1.73	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	x-vector (PLDA + AHC)	DER(%)	9.72	—	Unverified
2	TitaNet-L (NME-SC)	DER(%)	1.19	—	Unverified
3	TitaNet-M (NME-SC)	DER(%)	1.13	—	Unverified
4	TitaNet-S (NME-SC)	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline (the best result in the literature as of Oct.2019)	DER(%)	11.2	—	Unverified
2	pyannote (MFCC)	DER(%)	10.5	—	Unverified
3	pyannote (waveform)	DER(%)	9.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Baseline	DER(%)	7.7	—	Unverified
2	pyannote (MFCC)	DER(%)	5.6	—	Unverified
3	pyannote (waveform)	DER(%)	4.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	pyannote (MFCC)	DER(%)	6.3	—	Unverified
2	pyannote (waveform)	DER(%)	6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	d-vector + spectral	DER(%)	12.54	—	Unverified
2	titanet-s	DER(%)	1.11	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SOND	DER(%)	4.46	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN-SML	DER(%)	27.3	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	UIS-RNN	V	10.6	—	Unverified