SOTAVerified|Agents Browse Leaderboard About Blog

Speaker Identification

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 248 papers

Title	Date	Tasks	Status	Hype
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit	May 20, 2022	AllAutomatic Speech Recognition (ASR)	CodeCode Available	6
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark	Jul 16, 2024	DiversitySpeaker Identification	CodeCode Available	5
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model	May 20, 2024	Audio ClassificationGPU	CodeCode Available	2
SSAST: Self-Supervised Audio Spectrogram Transformer	Oct 19, 2021	Audio ClassificationClassification	CodeCode Available	2
audino: A Modern Annotation Tool for Audio and Speech	Jun 9, 2020	Action DetectionActivity Detection	CodeCode Available	2
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings	Mar 13, 2025	Speaker Identificationspeech-recognition	CodeCode Available	1
Disentangling Textual and Acoustic Features of Neural Speech Representations	Oct 3, 2024	DisentanglementEmotion Recognition	CodeCode Available	1
ComiCap: A VLMs pipeline for dense captioning of Comic Panels	Sep 24, 2024	AttributeDense Captioning	CodeCode Available	1
CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding	Jul 4, 2024	Dialogue Generationobject-detection	CodeCode Available	1
InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language Models	Sep 21, 2023	Emotion RecognitionEmotion Recognition in Conversation	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 25Next →

All datasets VoxCeleb1 EVI en-GB EVI fr-FR EVI pl-PL

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	MSM-MAE	Top-1 (%)	96.6	—	Unverified
2	M2D/0.6	Top-1 (%)	96.5	—	Unverified
3	M2D/0.7	Top-1 (%)	96.3	—	Unverified
4	M2D ratio=0.6	Top-1 (%)	94.8	—	Unverified
5	AudioMAE (local)	Top-1 (%)	94.8	—	Unverified
6	ATST Base (ours)	Top-1 (%)	94.3	—	Unverified
7	AudioMAE (global)	Top-1 (%)	94.1	—	Unverified
8	AutoSpeech (N=8,C=128)	Top-1 (%)	87.66	—	Unverified
9	SSAST-FRAME	Top-1 (%)	80.8	—	Unverified
10	SSAMBA	Top-1 (%)	70.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Fuzzy Retrieval	Top-1 (%)	67.77	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Fuzzy Retrieval	Top-1 (%)	80.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Fuzzy Retrieval	Top-1 (%)	95.13	—	Unverified