| A Large-Scale Chinese Multimodal NER Dataset with Speech Clues | Aug 1, 2021 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 |
| Kosp2e: Korean Speech to English Translation Corpus | Jul 6, 2021 | speech-recognitionSpeech Recognition | CodeCode Available | 1 |
| Towards Automatic Speech to Sign Language Generation | Jun 24, 2021 | Speech-to-TextText Generation | CodeCode Available | 1 |
| Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation | May 11, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Learning Shared Semantic Space for Speech-to-Text Translation | May 7, 2021 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 |
| End-to-end Speech Translation via Cross-modal Progressive Training | Apr 21, 2021 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 |
| IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine Translation | Nov 1, 2020 | Dynamic Time WarpingMachine Translation | CodeCode Available | 1 |
| "Listen, Understand and Translate": Triple Supervision Decouples End-to-end Speech-to-text Translation | Sep 21, 2020 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 1 |
| Consecutive Decoding for Speech-to-text Translation | Sep 21, 2020 | DecoderMachine Translation | CodeCode Available | 1 |
| CoVoST 2 and Massively Multilingual Speech-to-Text Translation | Jul 20, 2020 | Machine Translationspeech-recognition | CodeCode Available | 1 |
| CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus | Feb 4, 2020 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 1 |
| FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural Networks | Jan 18, 2020 | Bayesian OptimizationObject Detection | CodeCode Available | 1 |
| Stacked DeBERT: All Attention in Incomplete Data for Text Classification | Jan 1, 2020 | AllChatbot | CodeCode Available | 1 |
| Common Voice: A Massively-Multilingual Speech Corpus | Dec 13, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Clotho: An Audio Captioning Dataset | Oct 21, 2019 | Audio captioningDiversity | CodeCode Available | 1 |
| Deep Reinforcement Learning For Sequence to Sequence Models | May 24, 2018 | Abstractive Text SummarizationCaption Generation | CodeCode Available | 1 |
| An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments | Jul 14, 2025 | Speech-to-Texttext-to-speech | —Unverified | 0 |
| LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization | Jun 20, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data | Jun 19, 2025 | SentenceSpeech-to-Text | —Unverified | 0 |
| I Speak and You Find: Robust 3D Visual Grounding with Noisy and Ambiguous Speech Inputs | Jun 17, 2025 | 3D visual groundingContrastive Learning | —Unverified | 0 |
| S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation | Jun 11, 2025 | Reading ComprehensionSpeech Synthesis | —Unverified | 0 |
| Advancing STT for Low-Resource Real-World Speech | Jun 10, 2025 | SentenceSpeech-to-Text | —Unverified | 0 |
| Improving Language and Modality Transfer in Translation by Character-level Modeling | May 30, 2025 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios | May 30, 2025 | Cross-Lingual TransferPhoneme Recognition | —Unverified | 0 |
| BeaverTalk: Oregon State University's IWSLT 2025 Simultaneous Speech Translation System | May 29, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |