Speaker Identification

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 248 papers

Title	Date	Tasks	Status	Hype
CoLMbo: Speaker Language Model for Descriptive Profiling	Jun 11, 2025	DescriptiveLanguage Modeling	CodeCode Available	0
Rhythm Features for Speaker Identification	Jun 7, 2025	Deep LearningRhythm	—Unverified	0
French Listening Tests for the Assessment of Intelligibility, Quality, and Identity of Body-Conducted Speech Enhancement	Jun 4, 2025	Bandwidth ExtensionSpeaker Identification	—Unverified	0
Speech Unlearning	Jun 1, 2025	Adversarial RobustnessKeyword Spotting	—Unverified	0
Pretraining Multi-Speaker Identification for Neural Speaker Diarization	May 30, 2025	speaker-diarizationSpeaker Diarization	—Unverified	0
REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion	May 27, 2025	DisentanglementSpeaker Identification	—Unverified	0
HPP-Voice: A Large-Scale Evaluation of Speech Embeddings for Multi-Phenotypic Classification	May 22, 2025	speaker-diarizationSpeaker Diarization	—Unverified	0
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio	May 15, 2025	Speaker Identificationspeech-recognition	—Unverified	0
From Dialect Gaps to Identity Maps: Tackling Variability in Speaker Verification	Apr 21, 2025	Data AugmentationSpeaker Identification	—Unverified	0
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues	Apr 21, 2025	BenchmarkingSpeaker Identification	—Unverified	0
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings	Mar 13, 2025	Speaker Identificationspeech-recognition	CodeCode Available	1
Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization	Feb 18, 2025	Automatic Speech RecognitionSpeaker Identification	—Unverified	0
A Preliminary Exploration with GPT-4o Voice Mode	Feb 14, 2025	Age ClassificationAudio Deepfake Detection	—Unverified	0
SCDiar: a streaming diarization system based on speaker change detection and speech recognition	Jan 28, 2025	Change Detectionspeaker-diarization	—Unverified	0
Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models	Jan 24, 2025	Emotion ClassificationSpeaker Identification	—Unverified	0
PolInterviews -- A Dataset of German Politician Public Broadcast Interviews	Jan 8, 2025	Speaker Identification	—Unverified	0
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding	Dec 23, 2024	Speaker Identification	CodeCode Available	0
Machine Unlearning reveals that the Gender-based Violence Victim Condition can be detected from Speech in a Speaker-Agnostic Setting	Nov 27, 2024	Machine UnlearningSpeaker Identification	—Unverified	0
Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network	Nov 22, 2024	Data AugmentationSpeaker Identification	CodeCode Available	0
Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and its Applications	Nov 20, 2024	Emotion RecognitionSpeaker Identification	—Unverified	0
Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments	Oct 7, 2024	Speaker Identificationspeech-recognition	—Unverified	0
Disentangling Textual and Acoustic Features of Neural Speech Representations	Oct 3, 2024	DisentanglementEmotion Recognition	CodeCode Available	1
Enhancing Open-Set Speaker Identification through Rapid Tuning with Speaker Reciprocal Points and Negative Sample	Sep 24, 2024	Speaker IdentificationSpeaker Recognition	—Unverified	0
Exploring VQ-VAE with Prosody Parameters for Speaker Anonymization	Sep 24, 2024	DecoderSpeaker anonymization	—Unverified	0
ComiCap: A VLMs pipeline for dense captioning of Comic Panels	Sep 24, 2024	AttributeDense Captioning	CodeCode Available	1
How Redundant Is the Transformer Stack in Speech Representation Models?	Sep 10, 2024	Knowledge DistillationSpeaker Identification	—Unverified	0
A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR	Sep 9, 2024	Automatic Speech Recognitionspeaker-diarization	—Unverified	0
Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue	Sep 7, 2024	Question AnsweringSpeaker Identification	CodeCode Available	0
Progressive Residual Extraction based Pre-training for Speech Representation Learning	Aug 31, 2024	Emotion RecognitionRepresentation Learning	—Unverified	0
Deep Learning for Speaker Identification: Architectural Insights from AB-1 Corpus Analysis and Performance Evaluation	Aug 13, 2024	Speaker Identification	CodeCode Available	0
Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models	Jul 16, 2024	AttributeSpeaker Identification	CodeCode Available	0
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark	Jul 16, 2024	DiversitySpeaker Identification	CodeCode Available	5
CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding	Jul 4, 2024	Dialogue Generationobject-detection	CodeCode Available	1
DASB -- Discrete Audio and Speech Benchmark	Jun 20, 2024	BenchmarkingEmotion Recognition	—Unverified	0
Evaluating Speaker Identity Coding in Self-supervised Models and Humans	Jun 14, 2024	Speaker Identification	—Unverified	0
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model	May 20, 2024	Audio ClassificationGPU	CodeCode Available	2
TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches	Apr 18, 2024	Age EstimationClassification	—Unverified	0
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework	Apr 9, 2024	Audio Classification	—Unverified	0
Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling	Apr 1, 2024	Speaker IdentificationSpeech Synthesis	—Unverified	0
Hearing-Loss Compensation Using Deep Neural Networks: A Framework and Results From a Listening Test	Mar 15, 2024	Music ClassificationSpeaker Identification	—Unverified	0
A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement	Mar 3, 2024	Automatic Speech RecognitionKeyword Spotting	—Unverified	0
Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification	Feb 29, 2024	Adversarial AttackClassification	—Unverified	0
Effect of utterance duration and phonetic content on speaker identification using second-order statistical methods	Feb 26, 2024	Speaker Identification	—Unverified	0
Significance of Chirp MFCC as a Feature in Speech and Audio Applications	Feb 19, 2024	Music ClassificationSpeaker Identification	—Unverified	0
Probing Self-supervised Learning Models with Target Speech Extraction	Feb 17, 2024	Self-Supervised LearningSpeaker Identification	—Unverified	0
Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis	Feb 11, 2024	RhythmSpeaker Identification	—Unverified	0
Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models	Jan 23, 2024	Speaker IdentificationSpeaker Recognition	—Unverified	0
SIG: Speaker Identification in Literature via Prompt-Based Generation	Dec 22, 2023	Speaker Identification	CodeCode Available	0
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices	Dec 20, 2023	Speaker IdentificationSpeaker Recognition	—Unverified	0
Efficiency-oriented approaches for self-supervised speech representation learning	Dec 18, 2023	Automatic Speech RecognitionRepresentation Learning	—Unverified	0

Show:10 25 50

← PrevPage 1 of 5Next →

All datasets VoxCeleb1 EVI en-GB EVI fr-FR EVI pl-PL

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	MSM-MAE	Top-1 (%)	96.6	—	Unverified
2	M2D/0.6	Top-1 (%)	96.5	—	Unverified
3	M2D/0.7	Top-1 (%)	96.3	—	Unverified
4	M2D ratio=0.6	Top-1 (%)	94.8	—	Unverified
5	AudioMAE (local)	Top-1 (%)	94.8	—	Unverified
6	ATST Base (ours)	Top-1 (%)	94.3	—	Unverified
7	AudioMAE (global)	Top-1 (%)	94.1	—	Unverified
8	AutoSpeech (N=8,C=128)	Top-1 (%)	87.66	—	Unverified
9	SSAST-FRAME	Top-1 (%)	80.8	—	Unverified
10	SSAMBA	Top-1 (%)	70.1	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Fuzzy Retrieval	Top-1 (%)	67.77	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Fuzzy Retrieval	Top-1 (%)	80.83	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Fuzzy Retrieval	Top-1 (%)	95.13	—	Unverified