| End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation | Nov 1, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 1 | 5 |
| End-to-end Speech Translation via Cross-modal Progressive Training | Apr 21, 2021 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 | 5 |
| Information-Transport-based Policy for Simultaneous Translation | Oct 22, 2022 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 | 5 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 | 5 |
| Clotho: An Audio Captioning Dataset | Oct 21, 2019 | Audio captioningDiversity | CodeCode Available | 1 | 5 |
| Fine-tuning Whisper on Low-Resource Languages for Real-World Applications | Dec 20, 2024 | FormSentence | CodeCode Available | 1 | 5 |
| A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing | Mar 18, 2022 | Representation LearningSpeaker Verification | CodeCode Available | 1 | 5 |
| FlexiBO: A Decoupled Cost-Aware Multi-Objective Optimization Approach for Deep Neural Networks | Jan 18, 2020 | Bayesian OptimizationObject Detection | CodeCode Available | 1 | 5 |
| Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation | May 11, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Late reverberation suppression using U-nets | Oct 5, 2021 | DecoderSpeech Dereverberation | CodeCode Available | 1 | 5 |