| Brilla AI: AI Contestant for the National Science and Maths Quiz | Mar 4, 2024 | MathQuestion Answering | CodeCode Available | 1 |
| ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation | May 24, 2023 | GPULanguage Modeling | CodeCode Available | 1 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 |
| Clotho: An Audio Captioning Dataset | Oct 21, 2019 | Audio captioningDiversity | CodeCode Available | 1 |
| Denial-of-Service Poisoning Attacks against Large Language Models | Oct 14, 2024 | 16kSpeech-to-Text | CodeCode Available | 1 |
| CoVoST: A Diverse Multilingual Speech-To-Text Translation Corpus | Feb 4, 2020 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 1 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 |
| End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation | Nov 1, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation | Apr 27, 2025 | RAGRetrieval | CodeCode Available | 1 |
| A Large-Scale Chinese Multimodal NER Dataset with Speech Clues | Aug 1, 2021 | named-entity-recognitionNamed Entity Recognition | CodeCode Available | 1 |