SOTAVerified

Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Showing 51100 of 328 papers

TitleStatusHype
Speaker Diarization with LSTMCode1
The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and BaselinesCode1
Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE ParadigmCode1
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based AttractorsCode1
DiaPer: End-to-End Neural Diarization with Perceiver-Based AttractorsCode1
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search ApproachCode1
CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning Speaker Count EstimationCode0
Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 FeaturesCode0
Ultrasound tongue imaging for diarization and alignment of child speech therapy sessionsCode0
A Comprehensive Evaluation of Incremental Speech Recognition and Diarization for Conversational AICode0
Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakersCode0
TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker DiarizationCode0
Compositional Clustering: Applications to Multi-Label Object Recognition and Speaker IdentificationCode0
The EURECOM Submission to the First DIHARD ChallengeCode0
The Second DIHARD Diarization Challenge: Dataset, task, and baselinesCode0
Supervised online diarization with sample mean loss for multi-domain dataCode0
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual InformationCode0
3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and DiarizationCode0
Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting ScenariosCode0
Supervised Hierarchical Clustering using Graph Neural Networks for Speaker DiarizationCode0
Self-Tuning Spectral Clustering for Speaker DiarizationCode0
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and TokensCode0
End-to-End Neural Speaker Diarization with Permutation-Free ObjectivesCode0
Self-Supervised Metric Learning With Graph Clustering For Speaker DiarizationCode0
Self-supervised Representation Learning With Path Integral Clustering For Speaker DiarizationCode0
Probabilistic embeddings for speaker diarizationCode0
Robust speaker recognition using unsupervised adversarial invarianceCode0
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of SpeakersCode0
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and AssessmentCode0
Powerset multi-class cross entropy loss for neural speaker diarizationCode0
Scalable Adaptation of State Complexity for Nonparametric Hidden Markov ModelsCode0
Speaker Diarization using Two-pass Leave-One-Out Gaussian PLDA Clustering of DNN EmbeddingsCode0
Neural Speaker Diarization with Speaker-Wise Chain RuleCode0
On Out-of-Distribution Detection for Audio with Deep Nearest NeighborsCode0
Neural Diarization with Non-autoregressive Intermediate AttractorsCode0
On the calibration of powerset speaker diarization modelsCode0
LSTM based Similarity Measurement with Spectral Clustering for Speaker DiarizationCode0
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech RepresentationCode0
Multi-Stage Speaker Diarization for Noisy ClassroomsCode0
End-to-End Supervised Hierarchical Graph Clustering for Speaker DiarizationCode0
DiaCorrect: End-to-end error correction for speaker diarizationCode0
Fully Supervised Speaker DiarizationCode0
Data Fusion for Audiovisual Speaker Localization: Extending Dynamic Stream Weights to the Spatial DomainCode0
Long-term Conversation Analysis: Exploring Utility and PrivacyCode0
Cross-Channel Attention-Based Target Speaker Voice Activity Detection: Experimental Results for M2MeT Challenge0
A sticky HDP-HMM with application to speaker diarization0
Constrained speaker diarization of TV series based on visual patterns0
Computer-assisted Speaker Diarization: How to Evaluate Human Corrections0
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization0
All-neural online source separation, counting, and diarization for meeting analysis0
Show:102550
← PrevPage 2 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1COS+NJW-SC (Oracle SAD)DER(%)24.05Unverified
2EENDDER(%)23.07Unverified
3COS+AHC (Oracle SAD)DER(%)21.13Unverified
4SA-EEND (2-spk, no-adapt)DER(%)12.66Unverified
5EEND-OLADER(%)12.57Unverified
6SA-EEND (2-spk, adapted)DER(%)10.76Unverified
7TOLDDER(%)10.14Unverified
8COS+B-SC (Oracle SAD)DER(ig olp)8.78Unverified
9PLDA+AHC (Oracle SAD)DER(ig olp)8.39Unverified
10COS+NME-SC (Oracle SAD)DER(ig olp)7.29Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)8.39Unverified
2TitaNet-L (NME-SC)DER(%)6.73Unverified
3TitaNet-M (NME-SC)DER(%)6.47Unverified
4TitaNet-S (NME-SC)DER(%)6.37Unverified
5x-vector (MCGAN)DER(%)5.73Unverified
#ModelMetricClaimedVerifiedStatus
1ECAPA (SC)DER(%)2.36Unverified
2TitaNet-L (NME-SC)DER(%)2.03Unverified
3TitaNet-S (NME-SC)DER(%)2Unverified
4TitaNet-M (NME-SC)DER(%)1.99Unverified
#ModelMetricClaimedVerifiedStatus
1TitaNet-S (NME-SC)DER(%)2.22Unverified
2TitaNet-M (NME-SC)DER(%)1.79Unverified
3ECAPA (SC)DER(%)1.78Unverified
4TitaNet-L (NME-SC)DER(%)1.73Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)9.72Unverified
2TitaNet-L (NME-SC)DER(%)1.19Unverified
3TitaNet-M (NME-SC)DER(%)1.13Unverified
4TitaNet-S (NME-SC)DER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1Baseline (the best result in the literature as of Oct.2019)DER(%)11.2Unverified
2pyannote (MFCC)DER(%)10.5Unverified
3pyannote (waveform)DER(%)9.9Unverified
#ModelMetricClaimedVerifiedStatus
1BaselineDER(%)7.7Unverified
2pyannote (MFCC)DER(%)5.6Unverified
3pyannote (waveform)DER(%)4.9Unverified
#ModelMetricClaimedVerifiedStatus
1pyannote (MFCC)DER(%)6.3Unverified
2pyannote (waveform)DER(%)6Unverified
#ModelMetricClaimedVerifiedStatus
1d-vector + spectralDER(%)12.54Unverified
2titanet-sDER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1SONDDER(%)4.46Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNN-SMLDER(%)27.3Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNNV10.6Unverified