STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization Jun 18, 2023 All Graph Learning
Code Code Available 1Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation Jun 15, 2023 Automatic Speech Recognition Clustering
Code Code Available 1Utilizing Longitudinal Chest X-Rays and Reports to Pre-Fill Radiology Reports Jun 14, 2023 Decoder speech-recognition
Code Code Available 1ITALIC: An Italian Intent Classification Dataset Jun 14, 2023 Classification intent-classification
Code Code Available 1Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages Jun 13, 2023 Contrastive Learning speech-recognition
Code Code Available 1OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment Jun 10, 2023 Audio-Visual Speech Recognition Lip Reading
Code Code Available 1Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages Jun 7, 2023 Cross-Lingual Transfer speech-recognition
Code Code Available 1Allophant: Cross-lingual Phoneme Recognition with Articulatory Attributes Jun 7, 2023 Attribute Cross-Lingual Transfer
Code Code Available 1MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information Jun 4, 2023 Audio-Visual Speech Recognition speech-recognition
Code Code Available 1SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization Jun 3, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Can Contextual Biasing Remain Effective with Whisper and GPT-2? Jun 2, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Improved DeepFake Detection Using Whisper Features Jun 2, 2023 Automatic Speech Recognition DeepFake Detection
Code Code Available 1DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model Jun 2, 2023 speech-recognition Speech Recognition
Code Code Available 1Perception and Semantic Aware Regularization for Sequential Confidence Calibration May 31, 2023 Language Modelling speech-recognition
Code Code Available 1Bridging the Granularity Gap for Acoustic Modeling May 27, 2023 speech-recognition Speech Recognition
Code Code Available 1BIG-C: a Multimodal Multi-Purpose Dataset for Bemba May 26, 2023 Machine Translation speech-recognition
Code Code Available 1Scaling Speech Technology to 1,000+ Languages May 22, 2023 Automatic Speech Recognition Language Identification
Code Code Available 1CopyNE: Better Contextual ASR by Copying Named Entities May 22, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation May 18, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization May 18, 2023 Audio-Visual Speech Recognition Prompt Engineering
Code Code Available 1Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering May 18, 2023 Acoustic Unit Discovery Clustering
Code Code Available 1Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition May 16, 2023 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 1Back Translation for Speech-to-text Translation Without Transcripts May 15, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1CB-Conformer: Contextual biasing Conformer for biased word recognition Apr 19, 2023 Automatic Speech Recognition Language Modeling
Code Code Available 1Efficient Sequence Transduction by Jointly Predicting Tokens and Durations Apr 13, 2023 Intent Classification Intent Classification and Slot Filling
Code Code Available 1When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP Mar 28, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 1Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring Mar 15, 2023 Audio-Visual Speech Recognition speech-recognition
Code Code Available 1TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings Mar 7, 2023 Action Detection Activity Detection
Code Code Available 1Calibrating Transformers via Sparse Gaussian Processes Mar 4, 2023 Bayesian Inference Gaussian Processes
Code Code Available 1BrainBERT: Self-supervised representation learning for intracranial recordings Feb 28, 2023 Language Modeling Language Modelling
Code Code Available 1Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding Feb 27, 2023 Model Compression Representation Learning
Code Code Available 1Gradient Remedy for Multi-Task Learning in End-to-End Noise-Robust Speech Recognition Feb 22, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One Feb 20, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition Feb 2, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation Jan 30, 2023 Automatic Speech Recognition Knowledge Distillation
Code Code Available 1OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset Jan 16, 2023 Audio-Visual Speech Recognition Lip Reading
Code Code Available 1Audio-Visual Efficient Conformer for Robust Speech Recognition Jan 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Towards Voice Reconstruction from EEG during Imagined Speech Jan 2, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Learning to Detect Noisy Labels Using Model-Based Features Dec 28, 2022 Meta-Learning speech-recognition
Code Code Available 1Skit-S2I: An Indian Accented Speech to Intent dataset Dec 26, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language Dec 14, 2022 Decoder image-classification
Code Code Available 1Jointly Learning Visual and Auditory Speech Representations from Raw Data Dec 12, 2022 Audio-Visual Speech Recognition Lipreading
Code Code Available 1BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm Dec 11, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1GPU-accelerated Guided Source Separation for Meeting Transcription Dec 10, 2022 blind source separation CPU
Code Code Available 1SoftCTC -- Semi-Supervised Learning for Text Recognition using Soft Pseudo-Labels Dec 5, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems Nov 29, 2022 speech-recognition Speech Recognition
Code Code Available 1A Persian ASR-based SER: Modification of Sharif Emotional Speech Database and Investigation of Persian Text Corpora Nov 18, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1MelHuBERT: A simplified HuBERT on Mel spectrograms Nov 17, 2022 Automatic Speech Recognition Self-Supervised Learning
Code Code Available 1MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets Nov 14, 2022 Automatic Speech Recognition Multi-Task Learning
Code Code Available 1ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications Nov 8, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1