| End-to-end Speech Translation via Cross-modal Progressive Training | Apr 21, 2021 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 | 5 |
| Back Translation for Speech-to-text Translation Without Transcripts | May 15, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation | Nov 1, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 1 | 5 |
| Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation | Apr 27, 2025 | RAGRetrieval | CodeCode Available | 1 | 5 |
| Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet | Jun 25, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 | 5 |
| A Large-Scale Chinese Multimodal NER Dataset with Speech Clues | Aug 1, 2021 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 | 5 |
| Brilla AI: AI Contestant for the National Science and Maths Quiz | Mar 4, 2024 | MathQuestion Answering | CodeCode Available | 1 | 5 |
| CoVoST 2 and Massively Multilingual Speech-to-Text Translation | Jul 20, 2020 | Machine Translationspeech-recognition | CodeCode Available | 1 | 5 |
| DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities | Feb 16, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |