| OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | Aug 6, 2024 | BenchmarkingRetrieval-augmented Generation | CodeCode Available | 1 | 5 |
| Speech Emotion Recognition with Multi-Task Learning | Sep 6, 2021 | Emotion ClassificationEmotion Recognition | CodeCode Available | 1 | 5 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 | 5 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 | 5 |
| Stacked DeBERT: All Attention in Incomplete Data for Text Classification | Jan 1, 2020 | AllChatbot | CodeCode Available | 1 | 5 |
| Pre-training for Speech Translation: CTC Meets Optimal Transport | Jan 27, 2023 | Multi-Task LearningSpeech-to-Text | CodeCode Available | 1 | 5 |
| Brilla AI: AI Contestant for the National Science and Maths Quiz | Mar 4, 2024 | MathQuestion Answering | CodeCode Available | 1 | 5 |
| PSST! Prosodic Speech Segmentation with Transformers | Feb 3, 2023 | SegmentationSpeech-to-Text | CodeCode Available | 1 | 5 |
| DUB: Discrete Unit Back-translation for Speech Translation | May 19, 2023 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 | 5 |
| End-to-end Speech Translation via Cross-modal Progressive Training | Apr 21, 2021 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 | 5 |
| Deep Reinforcement Learning For Sequence to Sequence Models | May 24, 2018 | Abstractive Text SummarizationCaption Generation | CodeCode Available | 1 | 5 |
| Revisiting Interpolation Augmentation for Speech-to-Text Generation | Jun 22, 2024 | Speech-to-TextText Generation | CodeCode Available | 1 | 5 |
| A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing | Mar 18, 2022 | Representation LearningSpeaker Verification | CodeCode Available | 1 | 5 |
| Denial-of-Service Poisoning Attacks against Large Language Models | Oct 14, 2024 | 16kSpeech-to-Text | CodeCode Available | 1 | 5 |
| STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation | Mar 20, 2022 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 | 5 |
| End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation | Nov 1, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 1 | 5 |
| Careless Whisper: Speech-to-Text Hallucination Harms | Feb 12, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 | 5 |
| Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation | Dec 6, 2019 | FormMachine Translation | CodeCode Available | 0 | 5 |
| Calibrated SVM for Probabilistic Classification of In-Vehicle Voices into Vehicle Commands via Voice-to-Text LLM Transformation | Jun 28, 2024 | Speech-to-Texttext-classification | CodeCode Available | 0 | 5 |
| Revisiting End-to-End Speech-to-Text Translation From Scratch | Jun 9, 2022 | Decoderspeech-recognition | CodeCode Available | 0 | 5 |
| SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation | Oct 13, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy | Oct 13, 2022 | Generative Adversarial NetworkSpeaker anonymization | CodeCode Available | 0 | 5 |
| Pre-training on high-resource speech recognition improves low-resource speech-to-text translation | Sep 5, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Scribosermo: Fast Speech-to-Text models for German and other Languages | Oct 15, 2021 | Speech RecognitionSpeech-to-Text | CodeCode Available | 0 | 5 |
| BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation System | May 29, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach | Sep 13, 2024 | In-Context LearningRetrieval | CodeCode Available | 0 | 5 |
| OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification | Feb 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| An Empirical Study of Consistency Regularization for End-to-End Speech-to-Text Translation | Aug 28, 2023 | Machine TranslationNMT | CodeCode Available | 0 | 5 |
| A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architecture | Jan 6, 2022 | Speech-to-Texttext-to-speech | CodeCode Available | 0 | 5 |
| M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation | Jul 3, 2022 | DecoderSpeech-to-Text | CodeCode Available | 0 | 5 |
| Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks | Feb 19, 2025 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 0 | 5 |
| MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition | Nov 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Automatic Quality Assessment for Speech Translation Using Joint ASR and MT Features | Sep 20, 2016 | Speech-to-TextTranslation | CodeCode Available | 0 | 5 |
| Let's Give a Voice to Conversational Agents in Virtual Reality | Aug 4, 2023 | Speech-to-Texttext-to-speech | CodeCode Available | 0 | 5 |
| Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset | Nov 29, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| CoVoSwitch: Machine Translation of Synthetic Code-Switched Text Based on Intonation Units | Jul 19, 2024 | Machine TranslationSpeech-to-Text | CodeCode Available | 0 | 5 |
| Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation | Feb 9, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| LibriS2S: A German-English Speech-to-Speech Translation Corpus | Apr 22, 2022 | Speech-to-Speech TranslationSpeech-to-Text | CodeCode Available | 0 | 5 |
| Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision | Dec 30, 2023 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 | 5 |
| A Dataset for Speech Emotion Recognition in Greek Theatrical Plays | Mar 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition | Dec 23, 2021 | BenchmarkingDeep Learning | CodeCode Available | 0 | 5 |
| Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning | Sep 21, 2016 | DecoderMulti-Task Learning | CodeCode Available | 0 | 5 |
| A Change of Heart: Improving Speech Emotion Recognition through Speech-to-Text Modality Conversion | Jul 21, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Contextualized Translation of Automatically Segmented Speech | Aug 5, 2020 | SegmentationSentence | CodeCode Available | 0 | 5 |
| Audio Adversarial Examples: Targeted Attacks on Speech-to-Text | Jan 5, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Infusing Future Information into Monotonic Attention Through Language Models | Sep 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models | Jul 9, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 0 | 5 |
| Attentively Embracing Noise for Robust Latent Representation in BERT | Dec 1, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models | Jun 29, 2022 | Intent ClassificationSlot Filling | CodeCode Available | 0 | 5 |
| Fleurs-SLU: A Massively Multilingual Benchmark for Spoken Language Understanding | Jan 10, 2025 | Automatic Speech RecognitionClassification | CodeCode Available | 0 | 5 |