D4AM: A General Denoising Framework for Downstream Acoustic Models Nov 28, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Do VSR Models Generalize Beyond LRS3? Nov 23, 2023 Lip Reading speech-recognition
Code Code Available 1Zero-shot audio captioning with audio-language model guidance and audio context keywords Nov 14, 2023 Audio captioning Descriptive
Code Code Available 1Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation Nov 9, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition Nov 8, 2023 CPU Decoder
Code Code Available 1Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Nov 7, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts Nov 2, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Automatic Disfluency Detection from Untranscribed Speech Nov 1, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation Nov 1, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 1Developing a Multilingual Dataset and Evaluation Metrics for Code-Switching: A Focus on Hong Kong's Polylingual Dynamics Oct 27, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1CL-MASR: A Continual Learning Benchmark for Multilingual ASR Oct 25, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1ArTST: Arabic Text and Speech Transformer Oct 25, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Accented Speech Recognition With Accent-specific Codebooks Oct 24, 2023 Accented Speech Recognition Automatic Speech Recognition
Code Code Available 1How Much Context Does My Attention-Based ASR System Need? Oct 24, 2023 speech-recognition Speech Recognition
Code Code Available 1Advancing Test-Time Adaptation in Wild Acoustic Test Settings Oct 14, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Unsupervised Speech Recognition with N-Skipgram and Positional Unigram Matching Oct 3, 2023 speech-recognition Speech Recognition
Code Code Available 1Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech Oct 1, 2023 speech-recognition Speech Recognition
Code Code Available 1RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation Sep 29, 2023 Audio-Visual Speech Recognition speech-recognition
Code Code Available 1HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models Sep 27, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Speech collage: code-switched audio generation by collaging monolingual corpora Sep 27, 2023 Audio Generation Automatic Speech Recognition
Code Code Available 1Updated Corpora and Benchmarks for Long-Form Speech Recognition Sep 26, 2023 Form speech-recognition
Code Code Available 1Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data Sep 25, 2023 Speech Recognition Translation
Code Code Available 1Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning Sep 25, 2023 Representation Learning Self-Supervised Learning
Code Code Available 1Memory-augmented conformer for improved end-to-end long-form ASR Sep 22, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition Sep 21, 2023 speech-recognition Speech Recognition
Code Code Available 1Fine-Tuning Self-Supervised Learning Models for End-to-End Pronunciation Scoring Sep 19, 2023 Feature Engineering Phone-level pronunciation scoring
Code Code Available 1HypR: A comprehensive study for ASR hypothesis revising with a reference corpus Sep 18, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1DiaCorrect: Error Correction Back-end For Speaker Diarization Sep 15, 2023 Automatic Speech Recognition Decoder
Code Code Available 1Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper Sep 15, 2023 Language Identification speech-recognition
Code Code Available 1Unimodal Aggregation for CTC-based Speech Recognition Sep 15, 2023 Automatic Speech Recognition Decoder
Code Code Available 1EnCodecMAE: Leveraging neural codecs for universal audio representation learning Sep 14, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1DiariST: Streaming Speech Translation with Speaker Diarization Sep 14, 2023 speaker-diarization Speaker Diarization
Code Code Available 1BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing Sep 2, 2023 speech-recognition Speech Recognition
Code Code Available 1Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder Aug 14, 2023 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 1OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation Aug 8, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus Jul 29, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Jul 27, 2023 Automatic Speech Recognition Contrastive Learning
Code Code Available 1Adaptation of Whisper models to child speech recognition Jul 24, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning Jul 18, 2023 Domain Adaptation speech-recognition
Code Code Available 1ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development Jul 17, 2023 Action Detection Activity Detection
Code Code Available 1Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound Jul 17, 2023 Backdoor Attack speech-recognition
Code Code Available 1Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion Jul 1, 2023 speech-recognition Speech Recognition
Code Code Available 1Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings Jun 30, 2023 Audio Classification speech-recognition
Code Code Available 1LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT Jun 29, 2023 Automatic Lyrics Transcription Language Modeling
Code Code Available 1NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning Jun 21, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision Jun 21, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Quilt-1M: One Million Image-Text Pairs for Histopathology Jun 20, 2023 Automatic Speech Recognition Cross-Modal Retrieval
Code Code Available 1Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition Jun 18, 2023 Audio-Visual Speech Recognition speech-recognition
Code Code Available 1DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model Jun 18, 2023 Data Augmentation Decoder
Code Code Available 1STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization Jun 18, 2023 All Graph Learning
Code Code Available 1