| Extending Whisper with prompt tuning to target-speaker ASR | Dec 13, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| D4AM: A General Denoising Framework for Downstream Acoustic Models | Nov 28, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation | Nov 9, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning | Nov 7, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts | Nov 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Automatic Disfluency Detection from Untranscribed Speech | Nov 1, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation | Nov 1, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| Developing a Multilingual Dataset and Evaluation Metrics for Code-Switching: A Focus on Hong Kong's Polylingual Dynamics | Oct 27, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| ArTST: Arabic Text and Speech Transformer | Oct 25, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| CL-MASR: A Continual Learning Benchmark for Multilingual ASR | Oct 25, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Accented Speech Recognition With Accent-specific Codebooks | Oct 24, 2023 | Accented Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| Advancing Test-Time Adaptation in Wild Acoustic Test Settings | Oct 14, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| HowToCaption: Prompting LLMs to Transform Video Annotations at Scale | Oct 7, 2023 | Automatic Speech RecognitionVideo Captioning | CodeCode Available | 1 |
| Speech collage: code-switched audio generation by collaging monolingual corpora | Sep 27, 2023 | Audio GenerationAutomatic Speech Recognition | CodeCode Available | 1 |
| HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models | Sep 27, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Memory-augmented conformer for improved end-to-end long-form ASR | Sep 22, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| HypR: A comprehensive study for ASR hypothesis revising with a reference corpus | Sep 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Unimodal Aggregation for CTC-based Speech Recognition | Sep 15, 2023 | Automatic Speech RecognitionDecoder | CodeCode Available | 1 |
| DiaCorrect: Error Correction Back-end For Speaker Diarization | Sep 15, 2023 | Automatic Speech RecognitionDecoder | CodeCode Available | 1 |
| EnCodecMAE: Leveraging neural codecs for universal audio representation learning | Sep 14, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder | Aug 14, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation | Aug 8, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus | Jul 29, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures | Jul 27, 2023 | Automatic Speech RecognitionContrastive Learning | CodeCode Available | 1 |
| Adaptation of Whisper models to child speech recognition | Jul 24, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision | Jun 21, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning | Jun 21, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Quilt-1M: One Million Image-Text Pairs for Histopathology | Jun 20, 2023 | Automatic Speech RecognitionCross-Modal Retrieval | CodeCode Available | 1 |
| Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation | Jun 15, 2023 | Automatic Speech RecognitionClustering | CodeCode Available | 1 |
| SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization | Jun 3, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Improved DeepFake Detection Using Whisper Features | Jun 2, 2023 | Automatic Speech RecognitionDeepFake Detection | CodeCode Available | 1 |
| Can Contextual Biasing Remain Effective with Whisper and GPT-2? | Jun 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Scaling Speech Technology to 1,000+ Languages | May 22, 2023 | Automatic Speech RecognitionLanguage Identification | CodeCode Available | 1 |
| CopyNE: Better Contextual ASR by Copying Named Entities | May 22, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation | May 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition | May 16, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| Back Translation for Speech-to-text Translation Without Transcripts | May 15, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| CB-Conformer: Contextual biasing Conformer for biased word recognition | Apr 19, 2023 | Automatic Speech RecognitionLanguage Modeling | CodeCode Available | 1 |
| When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP | Mar 28, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition | Feb 22, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One | Feb 20, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition | Feb 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Cross-modal information fusion for voice spoofing detection | Feb 1, 2023 | Automatic Speech Recognitionfake voice detection | CodeCode Available | 1 |
| Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation | Jan 30, 2023 | Automatic Speech RecognitionKnowledge Distillation | CodeCode Available | 1 |
| Audio-Visual Efficient Conformer for Robust Speech Recognition | Jan 4, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Towards Voice Reconstruction from EEG during Imagined Speech | Jan 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Skit-S2I: An Indian Accented Speech to Intent dataset | Dec 26, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm | Dec 11, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels | Dec 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| A Persian ASR-based SER: Modification of Sharif Emotional Speech Database and Investigation of Persian Text Corpora | Nov 18, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |