SOTAVerified

Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Showing 101150 of 328 papers

TitleStatusHype
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation0
End-to-end Online Speaker Diarization with Target Speaker Tracking0
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition0
Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractorsCode1
NTT speaker diarization system for CHiME-7: multi-domain, multi-microphone End-to-end and vector clustering diarization0
Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation0
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence ArchitectureCode1
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network0
DiaCorrect: Error Correction Back-end For Speaker DiarizationCode1
Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)0
DiariST: Streaming Speech Translation with Speaker DiarizationCode1
Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis0
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search ApproachCode1
The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge0
Implicit Self-supervised Language Representation for Spoken Language Diarization0
Home monitoring for frailty detection through sound and speaker diarization analysis0
GIST-AiTeR Speaker Diarization System for VoxCeleb Speaker Recognition Challenge (VoxSRC) 20230
Speaker Diarization of Scripted Audiovisual Content0
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains0
Semi-supervised multi-channel speaker diarization with cross-channel attention0
Long-term Conversation Analysis: Exploring Utility and PrivacyCode0
Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization0
Implicit spoken language diarization0
Speech Emotion Diarization: Which Emotion Appears When?Code1
Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction0
Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features0
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level TasksCode1
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings0
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization0
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization0
Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio0
Neural Diarization with Non-autoregressive Intermediate AttractorsCode0
TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker DiarizationCode0
A Light Weight Model for Active Speaker DetectionCode1
Improving Transformer-based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads0
DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments0
Supervised Hierarchical Clustering using Graph Neural Networks for Speaker DiarizationCode0
A Reinforcement Learning Framework for Online Speaker Diarization0
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition ChallengeCode1
Towards Measuring and Scoring Speaker Diarization Fairness0
The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description0
BER: Balanced Error Rate For Speaker DiarizationCode1
Late Audio-Visual Fusion for In-The-Wild Speaker Diarization0
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings0
DiaCorrect: End-to-end error correction for speaker diarizationCode0
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction0
On Out-of-Distribution Detection for Audio with Deep Nearest NeighborsCode0
Privacy-preserving Automatic Speaker Diarization0
TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge0
Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage ClusteringCode2
Show:102550
← PrevPage 3 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1COS+NJW-SC (Oracle SAD)DER(%)24.05Unverified
2EENDDER(%)23.07Unverified
3COS+AHC (Oracle SAD)DER(%)21.13Unverified
4SA-EEND (2-spk, no-adapt)DER(%)12.66Unverified
5EEND-OLADER(%)12.57Unverified
6SA-EEND (2-spk, adapted)DER(%)10.76Unverified
7TOLDDER(%)10.14Unverified
8COS+B-SC (Oracle SAD)DER(ig olp)8.78Unverified
9PLDA+AHC (Oracle SAD)DER(ig olp)8.39Unverified
10COS+NME-SC (Oracle SAD)DER(ig olp)7.29Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)8.39Unverified
2TitaNet-L (NME-SC)DER(%)6.73Unverified
3TitaNet-M (NME-SC)DER(%)6.47Unverified
4TitaNet-S (NME-SC)DER(%)6.37Unverified
5x-vector (MCGAN)DER(%)5.73Unverified
#ModelMetricClaimedVerifiedStatus
1ECAPA (SC)DER(%)2.36Unverified
2TitaNet-L (NME-SC)DER(%)2.03Unverified
3TitaNet-S (NME-SC)DER(%)2Unverified
4TitaNet-M (NME-SC)DER(%)1.99Unverified
#ModelMetricClaimedVerifiedStatus
1TitaNet-S (NME-SC)DER(%)2.22Unverified
2TitaNet-M (NME-SC)DER(%)1.79Unverified
3ECAPA (SC)DER(%)1.78Unverified
4TitaNet-L (NME-SC)DER(%)1.73Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)9.72Unverified
2TitaNet-L (NME-SC)DER(%)1.19Unverified
3TitaNet-M (NME-SC)DER(%)1.13Unverified
4TitaNet-S (NME-SC)DER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1Baseline (the best result in the literature as of Oct.2019)DER(%)11.2Unverified
2pyannote (MFCC)DER(%)10.5Unverified
3pyannote (waveform)DER(%)9.9Unverified
#ModelMetricClaimedVerifiedStatus
1BaselineDER(%)7.7Unverified
2pyannote (MFCC)DER(%)5.6Unverified
3pyannote (waveform)DER(%)4.9Unverified
#ModelMetricClaimedVerifiedStatus
1pyannote (MFCC)DER(%)6.3Unverified
2pyannote (waveform)DER(%)6Unverified
#ModelMetricClaimedVerifiedStatus
1d-vector + spectralDER(%)12.54Unverified
2titanet-sDER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1SONDDER(%)4.46Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNN-SMLDER(%)27.3Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNNV10.6Unverified