MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Oct 1, 2024 Automatic Speech Recognition speech-recognition
Code Code Available 2Mamba for Streaming ASR Combined with Unimodal Aggregation Sep 30, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Predictive Speech Recognition and End-of-Utterance Detection Towards Spoken Dialog Systems Sep 30, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Alignment-Free Training for Transducer-based Multi-Talker ASR Sep 30, 2024 All Automatic Speech Recognition
— Unverified 0AfriHuBERT: A self-supervised speech representation model for African languages Sep 30, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding Sep 30, 2024 speech-recognition Speech Recognition
— Unverified 0Efficient Long-Form Speech Recognition for General Speech In-Context Learning Sep 29, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility Sep 29, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective Sep 29, 2024 Audio-Visual Speech Recognition Lip Reading
— Unverified 0CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought Sep 29, 2024 speech-recognition Speech Recognition
— Unverified 0Advanced Clustering Techniques for Speech Signal Enhancement: A Review and Metanalysis of Fuzzy C-Means, K-Means, and Kernel Fuzzy C-Means Methods Sep 28, 2024 Clustering Speech Enhancement
— Unverified 0A GEN AI Framework for Medical Note Generation Sep 27, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models Sep 27, 2024 Automatic Speech Recognition Mamba
— Unverified 0Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking Sep 27, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition Sep 26, 2024 Decoder Robust Speech Recognition
— Unverified 0Unveiling the Role of Pretraining in Direct Speech Translation Sep 26, 2024 Automatic Speech Recognition Decoder
— Unverified 0Are Transformers in Pre-trained LM A Good ASR Encoder? An Empirical Study Sep 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Deep CLAS: Deep Contextual Listen, Attend and Spell Sep 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events Sep 25, 2024 Audio Tagging Automatic Speech Recognition
— Unverified 0Weighted Cross-entropy for Low-Resource Languages in Multilingual Speech Recognition Sep 25, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Speech Recognition Rescoring with Large Speech-Text Foundation Models Sep 25, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling Sep 25, 2024 Automatic Speech Recognition Emotion Recognition
Code Code Available 0How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not Sep 25, 2024 Automatic Speech Recognition speech-recognition
— Unverified 0WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction Sep 24, 2024 Management speech-recognition
Code Code Available 3Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices Sep 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Revisiting Acoustic Features for Robust ASR Sep 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs Sep 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens Sep 24, 2024 Clustering Decoder
— Unverified 0Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLM Sep 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction Sep 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Strong Alone, Stronger Together: Synergizing Modality-Binding Foundation Models with Optimal Transport for Non-Verbal Emotion Recognition Sep 21, 2024 Audio Deepfake Detection DeepFake Detection
— Unverified 0MultiMed: Multilingual Medical Speech Recognition via Attention Encoder Decoder Sep 21, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection Sep 20, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Large Language Model Should Understand Pinyin for Chinese ASR Error Correction Sep 20, 2024 Automatic Speech Recognition Language Modeling
— Unverified 0Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper Sep 20, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR Sep 20, 2024 ARC Automatic Speech Recognition
— Unverified 0A Multimodal Dense Retrieval Approach for Speech-Based Open-Domain Question Answering Sep 20, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Personalized Speech Recognition for Children with Test-Time Adaptation Sep 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition Sep 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Sep 19, 2024 Mixture-of-Experts Robust Speech Recognition
— Unverified 0Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC Sep 19, 2024 Disentanglement speech-recognition
Code Code Available 1Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space Sep 19, 2024 Automatic Speech Recognition Data Augmentation
— Unverified 0META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR Sep 18, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Large Language Models are Strong Audio-Visual Speech Recognition Learners Sep 18, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 2ASR Benchmarking: Need for a More Representative Conversational Dataset Sep 18, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0A Joint Spectro-Temporal Relational Thinking Based Acoustic Modeling Framework Sep 17, 2024 Phoneme Recognition speech-recognition
— Unverified 0Moshi: a speech-text foundation model for real-time dialogue Sep 17, 2024 Action Detection Activity Detection
Code Code Available 9Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text Sep 17, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Chain-of-Thought Prompting for Speech Translation Sep 17, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0WER We Stand: Benchmarking Urdu ASR Models Sep 17, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0