| Speech Emotion Recognition with Multi-Task Learning | Sep 6, 2021 | Emotion ClassificationEmotion Recognition | CodeCode Available | 1 |
| A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing | Mar 18, 2022 | Representation LearningSpeaker Verification | CodeCode Available | 1 |
| CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus | Feb 4, 2020 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 1 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 |
| Towards an AI to Win Ghana's National Science and Maths Quiz | Aug 8, 2023 | MathQuestion Answering | CodeCode Available | 1 |
| Towards Automatic Speech to Sign Language Generation | Jun 24, 2021 | Speech-to-TextText Generation | CodeCode Available | 1 |
| Deep Reinforcement Learning For Sequence to Sequence Models | May 24, 2018 | Abstractive Text SummarizationCaption Generation | CodeCode Available | 1 |
| WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning | Jan 15, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine Translation | Nov 1, 2020 | Dynamic Time WarpingMachine Translation | CodeCode Available | 1 |
| Common Voice: A Massively-Multilingual Speech Corpus | Dec 13, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | Jun 26, 2024 | ArzEn Code-switched Translation to araArzEn Code-switched Translation to eng | CodeCode Available | 1 |
| Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation | Apr 27, 2025 | RAGRetrieval | CodeCode Available | 1 |
| Late reverberation suppression using U-nets | Oct 5, 2021 | DecoderSpeech Dereverberation | CodeCode Available | 1 |
| CoVoST 2 and Massively Multilingual Speech-to-Text Translation | Jul 20, 2020 | Machine Translationspeech-recognition | CodeCode Available | 1 |
| Pushing the Limits of Zero-shot End-to-End Speech Translation | Feb 16, 2024 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 1 |
| Cross-modal Contrastive Learning for Speech Translation | May 5, 2022 | Contrastive LearningRetrieval | CodeCode Available | 1 |
| Challenges and Opportunities of Speech Recognition for Bengali Language | Sep 27, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Comparative Study on End-to-end Speech to Text Translation | Nov 20, 2019 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Jun 11, 2024 | Contrastive LearningSpeech Synthesis | —Unverified | 0 |
| Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization | Oct 29, 2024 | GPURetrieval | —Unverified | 0 |
| A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks | Oct 21, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Corpus Creation and Evaluation for Speech-to-Text and Speech Translation | Aug 1, 2021 | Machine TranslationSpeech-to-Text | —Unverified | 0 |
| BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text | Aug 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Application-Agnostic Language Modeling for On-Device ASR | May 16, 2023 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 |
| Bridging the Modality Gap for Speech-to-Text Translation | Oct 28, 2020 | DecoderSpeech-to-Text | —Unverified | 0 |