Enhancing CTC-based speech recognition with diverse modeling units Jun 5, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5Text Injection for Neural Contextual Biasing Jun 5, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition Jun 5, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing Jun 4, 2024 Decoder Language Modeling
— Unverified 0Keyword-Guided Adaptation of Automatic Speech Recognition Jun 4, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping Jun 4, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision Jun 4, 2024 Automatic Speech Recognition speech-recognition
— Unverified 0Enabling ASR for Low-Resource Languages: A Comprehensive Dataset Creation Approach Jun 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization Jun 3, 2024 image-classification Image Classification
— Unverified 0YODAS: Youtube-Oriented Dataset for Audio and Speech Jun 2, 2024 Self-Supervised Learning speech-recognition
— Unverified 0Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning Jun 1, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities May 29, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Augmented Conversation with Embedded Speech-Driven On-the-Fly Referencing in AR May 28, 2024 Friction speech-recognition
— Unverified 0NUTS, NARS, and Speech May 28, 2024 Dimensionality Reduction speech-recognition
— Unverified 0Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation May 28, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation May 28, 2024 Machine Translation speech-recognition
Code Code Available 2A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition May 27, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients May 27, 2024 Automatic Speech Recognition Federated Learning
Code Code Available 0Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition May 24, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Contrastive and Consistency Learning for Neural Noisy-Channel Model in Spoken Language Understanding May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3A Survey on Vision-Language-Action Models for Embodied AI May 23, 2024 Image Captioning Instruction Following
Code Code Available 4Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Contextualized Automatic Speech Recognition with Dynamic Vocabulary May 22, 2024 Automatic Speech Recognition Language Modeling
— Unverified 0ST-Gait++: Leveraging spatio-temporal convolutions for gait-based emotion recognition on videos May 22, 2024 Emotion Classification Emotion Recognition
— Unverified 0Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation May 22, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish May 22, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Non-autoregressive real-time Accent Conversion model with voice cloning May 21, 2024 Speech Enhancement speech-recognition
— Unverified 0FairLENS: Assessing Fairness in Law Enforcement Speech Recognition May 21, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Mamba in Speech: Towards an Alternative to Self-Attention May 21, 2024 Mamba Speech Enhancement
Code Code Available 2Could a Computer Architect Understand our Brain? May 21, 2024 Descriptive ERP
— Unverified 0FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information May 21, 2024 Speech Recognition
Code Code Available 2Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining May 20, 2024 Sign Language Recognition speech-recognition
— Unverified 0Acoustic modeling for Overlapping Speech Recognition: JHU Chime-5 Challenge System May 17, 2024 Data Augmentation Speech Dereverberation
Code Code Available 4Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models May 16, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings May 15, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0No More Mumbles: Enhancing Robot Intelligibility through Speech Adaptation May 15, 2024 speech-recognition Speech Recognition
Code Code Available 0Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer May 15, 2024 Adversarial Attack Automatic Speech Recognition
— Unverified 0Sonos Voice Control Bias Assessment Dataset: A Methodology for Demographic Bias Assessment in Voice Assistants May 14, 2024 Automatic Speech Recognition Diversity
— Unverified 0Investigating the 'Autoencoder Behavior' in Speech Self-Supervised Models: a focus on HuBERT's Pretraining May 14, 2024 Self-Supervised Learning speech-recognition
— Unverified 0SpeechVerse: A Large-scale Generalizable Audio Language Model May 14, 2024 Automatic Speech Recognition Benchmarking
— Unverified 0Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases May 13, 2024 Audio Classification Diagnostic
Code Code Available 0Large Language Models for Education: A Survey May 12, 2024 Autonomous Driving speech-recognition
— Unverified 0SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset May 12, 2024 Action Spotting Automatic Speech Recognition
Code Code Available 1Watch Your Mouth: Silent Speech Recognition with Depth Sensing May 11, 2024 Deep Learning Lipreading
Code Code Available 1Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech May 10, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0DP-DyLoRA: Fine-Tuning Transformer-Based Models On-Device under Differentially Private Federated Learning using Dynamic Low-Rank Adaptation May 10, 2024 Federated Learning Natural Language Understanding
— Unverified 0Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models May 9, 2024 Adversarial Attack Automatic Speech Recognition
Code Code Available 1Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems May 9, 2024 Audio-Visual Speech Recognition Lipreading
Code Code Available 0