SOTAVerified

Speaker Identification

Papers

Showing 150 of 248 papers

TitleStatusHype
PaddleSpeech: An Easy-to-Use All-in-One Speech ToolkitCode6
VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification BenchmarkCode5
audino: A Modern Annotation Tool for Audio and SpeechCode2
SSAST: Self-Supervised Audio Spectrogram TransformerCode2
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space ModelCode2
IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languagesCode1
Learning Audio-Visual DereverberationCode1
InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language ModelsCode1
Supervised Speech Representation Learning for Parkinson's Disease ClassificationCode1
Speech Resynthesis from Discrete Disentangled Self-Supervised RepresentationsCode1
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker EmbeddingsCode1
Learning Speaker Representations with Mutual InformationCode1
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASRCode1
AM-MobileNet1D: A Portable Model for Speaker RecognitionCode1
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-TrainingCode1
Extended U-Net for Speaker Verification in Noisy EnvironmentsCode1
SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural SpeechCode1
Speaker Recognition from Raw Waveform with SincNetCode1
Generative Pre-Training for Speech with Autoregressive Predictive CodingCode1
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event ClassificationCode1
Streaming Speaker-Attributed ASR with Token-Level Speaker EmbeddingsCode1
Sum-Product Networks for Robust Automatic Speaker IdentificationCode1
MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation UnderstandingCode1
Masked Autoencoders that ListenCode1
ATST: Audio Representation Learning with Teacher-Student TransformerCode1
A Modulation-Domain Loss for Neural-Network-based Real-time Speech EnhancementCode1
Disentangling Textual and Acoustic Features of Neural Speech RepresentationsCode1
MelHuBERT: A simplified HuBERT on Mel spectrogramsCode1
Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length PairsCode1
AutoSpeech: Neural Architecture Search for Speaker RecognitionCode1
Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech SignalsCode1
Blind Speech Separation and Dereverberation using Neural BeamformingCode1
End-to-End Chinese Speaker IdentificationCode1
MPCHAT: Towards Multimodal Persona-Grounded ConversationCode1
FastAudio: A Learnable Audio Front-End for Spoof Speech DetectionCode1
Deep Discriminative Feature Learning for Accent RecognitionCode1
ComiCap: A VLMs pipeline for dense captioning of Comic PanelsCode1
CoMix: A Comprehensive Benchmark for Multi-Task Comic UnderstandingCode1
FoolHD: Fooling speaker identification by Highly imperceptible adversarial DisturbancesCode1
GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation UnderstandingCode1
Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition ModelsCode1
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language ProcessingCode1
Improving speaker discrimination of target speech extraction with time-domain SpeakerBeamCode1
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker RecordingsCode1
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker IdentificationCode0
Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio RepresentationCode0
Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenarioCode0
Cross-Lingual Speaker Identification Using Distant SupervisionCode0
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the InputCode0
Masked Modeling Duo: Towards a Universal Audio Pre-training FrameworkCode0
Show:102550
← PrevPage 1 of 5Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MSM-MAETop-1 (%)96.6Unverified
2M2D/0.6Top-1 (%)96.5Unverified
3M2D/0.7Top-1 (%)96.3Unverified
4M2D ratio=0.6Top-1 (%)94.8Unverified
5AudioMAE (local)Top-1 (%)94.8Unverified
6ATST Base (ours)Top-1 (%)94.3Unverified
7AudioMAE (global)Top-1 (%)94.1Unverified
8AutoSpeech (N=8,C=128)Top-1 (%)87.66Unverified
9SSAST-FRAMETop-1 (%)80.8Unverified
10SSAMBATop-1 (%)70.1Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)67.77Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)80.83Unverified
#ModelMetricClaimedVerifiedStatus
1Fuzzy RetrievalTop-1 (%)95.13Unverified