| VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation | May 25, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation | May 24, 2023 | GPULanguage Modeling | CodeCode Available | 1 |
| Improving Metrics for Speech Translation | May 22, 2023 | Speech-to-TextTranslation | —Unverified | 0 |
| DUB: Discrete Unit Back-translation for Speech Translation | May 19, 2023 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 |
| Application-Agnostic Language Modeling for On-Device ASR | May 16, 2023 | Automatic Speech RecognitionLanguage Modeling | —Unverified | 0 |
| A Whisper transformer for audio captioning trained with synthetic captions and transfer learning | May 15, 2023 | Audio captioningSpeech-to-Text | CodeCode Available | 1 |
| Back Translation for Speech-to-text Translation Without Transcripts | May 15, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks | May 4, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Improving Autoregressive NLP Tasks via Modular Linearized Attention | Apr 17, 2023 | Computational EfficiencyMachine Translation | —Unverified | 0 |
| ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit | Apr 10, 2023 | BenchmarkingSimultaneous Speech-to-Text Translation | —Unverified | 0 |