| PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit | May 20, 2022 | AllAutomatic Speech Recognition (ASR) | CodeCode Available | 6 | 5 |
| VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark | Jul 16, 2024 | DiversitySpeaker Identification | CodeCode Available | 5 | 5 |
| audino: A Modern Annotation Tool for Audio and Speech | Jun 9, 2020 | Action DetectionActivity Detection | CodeCode Available | 2 | 5 |
| SSAST: Self-Supervised Audio Spectrogram Transformer | Oct 19, 2021 | Audio ClassificationClassification | CodeCode Available | 2 | 5 |
| SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | May 20, 2024 | Audio ClassificationGPU | CodeCode Available | 2 | 5 |
| ATST: Audio Representation Learning with Teacher-Student Transformer | Apr 26, 2022 | Audio ClassificationInstrument Recognition | CodeCode Available | 1 | 5 |
| CoMix: A Comprehensive Benchmark for Multi-Task Comic Understanding | Jul 4, 2024 | Dialogue Generationobject-detection | CodeCode Available | 1 | 5 |
| A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement | Feb 15, 2021 | Speaker IdentificationSpeech Denoising | CodeCode Available | 1 | 5 |
| Blind Speech Separation and Dereverberation using Neural Beamforming | Mar 24, 2021 | Speaker IdentificationSpeaker Separation | CodeCode Available | 1 | 5 |
| ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification | Nov 23, 2022 | Keyword SpottingSelf-Supervised Learning | CodeCode Available | 1 | 5 |