| PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit | May 20, 2022 | AllAutomatic Speech Recognition (ASR) | CodeCode Available | 6 |
| VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark | Jul 16, 2024 | DiversitySpeaker Identification | CodeCode Available | 5 |
| SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | May 20, 2024 | Audio ClassificationGPU | CodeCode Available | 2 |
| SSAST: Self-Supervised Audio Spectrogram Transformer | Oct 19, 2021 | Audio ClassificationClassification | CodeCode Available | 2 |
| audino: A Modern Annotation Tool for Audio and Speech | Jun 9, 2020 | Action DetectionActivity Detection | CodeCode Available | 2 |
| Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings | Mar 13, 2025 | Speaker Identificationspeech-recognition | CodeCode Available | 1 |
| Disentangling Textual and Acoustic Features of Neural Speech Representations | Oct 3, 2024 | DisentanglementEmotion Recognition | CodeCode Available | 1 |
| ComiCap: A VLMs pipeline for dense captioning of Comic Panels | Sep 24, 2024 | AttributeDense Captioning | CodeCode Available | 1 |
| CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding | Jul 4, 2024 | Dialogue Generationobject-detection | CodeCode Available | 1 |
| InstructERC: Reforming Emotion Recognition in Conversation with Multi-task Retrieval-Augmented Large Language Models | Sep 21, 2023 | Emotion RecognitionEmotion Recognition in Conversation | CodeCode Available | 1 |
| Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals | Jun 2, 2023 | Depression DetectionDisentanglement | CodeCode Available | 1 |
| MPCHAT: Towards Multimodal Persona-Grounded Conversation | May 27, 2023 | Speaker Identification | CodeCode Available | 1 |
| GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding | May 16, 2023 | Speaker Identification | CodeCode Available | 1 |
| ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification | Nov 23, 2022 | Keyword SpottingSelf-Supervised Learning | CodeCode Available | 1 |
| MelHuBERT: A simplified HuBERT on Mel spectrograms | Nov 17, 2022 | Automatic Speech RecognitionSelf-Supervised Learning | CodeCode Available | 1 |
| IndicSUPERB: A Speech Processing Universal Performance Benchmark for Indian languages | Aug 24, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Masked Autoencoders that Listen | Jul 13, 2022 | Audio ClassificationDecoder | CodeCode Available | 1 |
| End-to-End Chinese Speaker Identification | Jul 1, 2022 | coreference-resolutionCoreference Resolution | CodeCode Available | 1 |
| Extended U-Net for Speaker Verification in Noisy Environments | Jun 27, 2022 | DenoisingSpeaker Identification | CodeCode Available | 1 |
| ATST: Audio Representation Learning with Teacher-Student Transformer | Apr 26, 2022 | Audio ClassificationInstrument Recognition | CodeCode Available | 1 |
| Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings | Mar 30, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech | Nov 19, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing | Oct 14, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training | Oct 12, 2021 | Data AugmentationMulti-Task Learning | CodeCode Available | 1 |
| FastAudio: A Learnable Audio Front-End for Spoof Speech Detection | Sep 6, 2021 | Speaker IdentificationSpeaker Verification | CodeCode Available | 1 |
| Learning Audio-Visual Dereverberation | Jun 14, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding | Jun 3, 2021 | Conversational Response SelectionLanguage Modeling | CodeCode Available | 1 |
| Supervised Speech Representation Learning for Parkinson's Disease Classification | Jun 1, 2021 | ClassificationRepresentation Learning | CodeCode Available | 1 |
| Speech Resynthesis from Discrete Disentangled Self-Supervised Representations | Apr 1, 2021 | DisentanglementRepresentation Learning | CodeCode Available | 1 |
| Blind Speech Separation and Dereverberation using Neural Beamforming | Mar 24, 2021 | Speaker IdentificationSpeaker Separation | CodeCode Available | 1 |
| A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement | Feb 15, 2021 | Speaker IdentificationSpeech Denoising | CodeCode Available | 1 |
| Deep Discriminative Feature Learning for Accent Recognition | Nov 25, 2020 | Face RecognitionSpeaker Identification | CodeCode Available | 1 |
| FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances | Nov 17, 2020 | Adversarial AttackSpeaker Identification | CodeCode Available | 1 |
| Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR | Nov 3, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Sum-Product Networks for Robust Automatic Speaker Identification | Aug 13, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings | Aug 11, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| AutoSpeech: Neural Architecture Search for Speaker Recognition | May 7, 2020 | image-classificationImage Classification | CodeCode Available | 1 |
| Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs | Apr 6, 2020 | Meta-LearningSpeaker Identification | CodeCode Available | 1 |
| AM-MobileNet1D: A Portable Model for Speaker Recognition | Mar 31, 2020 | Deep Learningmodel | CodeCode Available | 1 |
| Speech2Phone: A Novel and Efficient Method for Training Speaker Recognition Models | Feb 25, 2020 | Speaker IdentificationSpeaker Recognition | CodeCode Available | 1 |
| Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam | Jan 23, 2020 | Speaker IdentificationSpeech Extraction | CodeCode Available | 1 |
| Generative Pre-Training for Speech with Autoregressive Predictive Coding | Oct 23, 2019 | Representation LearningSpeaker Identification | CodeCode Available | 1 |
| Learning Speaker Representations with Mutual Information | Dec 1, 2018 | SentenceSpeaker Identification | CodeCode Available | 1 |
| Speaker Recognition from Raw Waveform with SincNet | Jul 29, 2018 | Speaker IdentificationSpeaker Recognition | CodeCode Available | 1 |
| CoLMbo: Speaker Language Model for Descriptive Profiling | Jun 11, 2025 | DescriptiveLanguage Modeling | CodeCode Available | 0 |
| Rhythm Features for Speaker Identification | Jun 7, 2025 | Deep LearningRhythm | —Unverified | 0 |
| French Listening Tests for the Assessment of Intelligibility, Quality, and Identity of Body-Conducted Speech Enhancement | Jun 4, 2025 | Bandwidth ExtensionSpeaker Identification | —Unverified | 0 |
| Speech Unlearning | Jun 1, 2025 | Adversarial RobustnessKeyword Spotting | —Unverified | 0 |
| Pretraining Multi-Speaker Identification for Neural Speaker Diarization | May 30, 2025 | speaker-diarizationSpeaker Diarization | —Unverified | 0 |
| REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion | May 27, 2025 | DisentanglementSpeaker Identification | —Unverified | 0 |