SOTAVerified

Speaker Diarization

Speaker Diarization is the task of segmenting and co-indexing audio recordings by speaker. The way the task is commonly defined, the goal is not to identify known speakers, but to co-index segments that are attributed to the same speaker; in other words, diarization implies finding speaker boundaries and grouping segments that belong to the same speaker, and, as a by-product, determining the number of distinct speakers. In combination with speech recognition, diarization enables speaker-attributed speech-to-text transcription.

Source: Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Papers

Showing 51100 of 328 papers

TitleStatusHype
End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label ClassificationCode1
Speaker Diarization with Region Proposal NetworkCode1
Phoneme Boundary Detection using Learnable Segmental FeaturesCode1
End-to-End Neural Speaker Diarization with Self-attentionCode1
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker DetectionCode1
Speaker Diarization with LSTMCode1
Exploring Speaker Diarization with Mixture of Experts0
M3SD: Multi-modal, Multi-scenario and Multi-language Speaker Diarization Dataset0
Seewo's Submission to MLC-SLM: Lessons learned from Speech Reasoning Language Models0
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition0
Diarization-Aware Multi-Speaker Automatic Speech Recognition via Large Language Models0
Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling0
Pretraining Multi-Speaker Identification for Neural Speaker Diarization0
Fine-tune Before Structured Pruning: Towards Compact and Accurate Self-Supervised Models for Speaker Diarization0
VoxRAG: A Step Toward Transcription-Free RAG Systems in Spoken Question Answering0
Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge0
HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification0
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition0
Multi-Stage Speaker Diarization for Noisy ClassroomsCode0
Speaker Diarization for Low-Resource Languages Through Wav2vec Fine-Tuning0
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors0
Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge0
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond0
SCDiar: a streaming diarization system based on speaker change detection and speech recognition0
Language Modelling for Speaker Diarization in Telephonic Interviews0
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch0
Comprehensive Audio Query Handling System with Integrated Expert Models and Contextual Understanding0
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and AssessmentCode0
Disentangled-Transformer: An Explainable End-to-End Automatic Speech Recognition Model with Speech Content-Context Separation0
Sequence-to-Sequence Neural Diarization with Automatic Speaker Detection and Representation0
DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions0
Guided Speaker Embedding0
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings0
On the calibration of powerset speaker diarization modelsCode0
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR0
TCG CREST System Description for the Second DISPLACE Challenge0
Self-Tuning Spectral Clustering for Speaker DiarizationCode0
Unified Audio Event Detection0
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and TokensCode0
A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR0
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization0
Recursive Attentive Pooling for Extracting Speaker Embeddings from Multi-Speaker Recordings0
Speaker Tagging Correction With Non-Autoregressive Language Models0
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization0
An approach to optimize inference of the DIART speaker diarization pipeline0
Long-Term Conversation Analysis: Privacy-Utility Trade-off under Noise and Reverberation0
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning0
TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 20240
Show:102550
← PrevPage 2 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1COS+NJW-SC (Oracle SAD)DER(%)24.05Unverified
2EENDDER(%)23.07Unverified
3COS+AHC (Oracle SAD)DER(%)21.13Unverified
4SA-EEND (2-spk, no-adapt)DER(%)12.66Unverified
5EEND-OLADER(%)12.57Unverified
6SA-EEND (2-spk, adapted)DER(%)10.76Unverified
7TOLDDER(%)10.14Unverified
8COS+B-SC (Oracle SAD)DER(ig olp)8.78Unverified
9PLDA+AHC (Oracle SAD)DER(ig olp)8.39Unverified
10COS+NME-SC (Oracle SAD)DER(ig olp)7.29Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)8.39Unverified
2TitaNet-L (NME-SC)DER(%)6.73Unverified
3TitaNet-M (NME-SC)DER(%)6.47Unverified
4TitaNet-S (NME-SC)DER(%)6.37Unverified
5x-vector (MCGAN)DER(%)5.73Unverified
#ModelMetricClaimedVerifiedStatus
1ECAPA (SC)DER(%)2.36Unverified
2TitaNet-L (NME-SC)DER(%)2.03Unverified
3TitaNet-S (NME-SC)DER(%)2Unverified
4TitaNet-M (NME-SC)DER(%)1.99Unverified
#ModelMetricClaimedVerifiedStatus
1TitaNet-S (NME-SC)DER(%)2.22Unverified
2TitaNet-M (NME-SC)DER(%)1.79Unverified
3ECAPA (SC)DER(%)1.78Unverified
4TitaNet-L (NME-SC)DER(%)1.73Unverified
#ModelMetricClaimedVerifiedStatus
1x-vector (PLDA + AHC)DER(%)9.72Unverified
2TitaNet-L (NME-SC)DER(%)1.19Unverified
3TitaNet-M (NME-SC)DER(%)1.13Unverified
4TitaNet-S (NME-SC)DER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1Baseline (the best result in the literature as of Oct.2019)DER(%)11.2Unverified
2pyannote (MFCC)DER(%)10.5Unverified
3pyannote (waveform)DER(%)9.9Unverified
#ModelMetricClaimedVerifiedStatus
1BaselineDER(%)7.7Unverified
2pyannote (MFCC)DER(%)5.6Unverified
3pyannote (waveform)DER(%)4.9Unverified
#ModelMetricClaimedVerifiedStatus
1pyannote (MFCC)DER(%)6.3Unverified
2pyannote (waveform)DER(%)6Unverified
#ModelMetricClaimedVerifiedStatus
1d-vector + spectralDER(%)12.54Unverified
2titanet-sDER(%)1.11Unverified
#ModelMetricClaimedVerifiedStatus
1SONDDER(%)4.46Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNN-SMLDER(%)27.3Unverified
#ModelMetricClaimedVerifiedStatus
1UIS-RNNV10.6Unverified