| Direct Punjabi to English speech translation using discrete units | Feb 25, 2024 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| Hands-Free VR | Feb 23, 2024 | DiversityLanguage Modelling | —Unverified | 0 |
| OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification | Feb 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? | Feb 19, 2024 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| Syllable based DNN-HMM Cantonese Speech to Text System | Feb 13, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Careless Whisper: Speech-to-Text Hallucination Harms | Feb 12, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Named Entity Recognition for Address Extraction in Speech-to-Text Transcriptions Using Synthetic Data | Feb 8, 2024 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| Digits micro-model for accurate and secure transactions | Feb 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Streaming Sequence Transduction through Dynamic Compression | Feb 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| A Case Study on Filtering for End-to-End Speech Translation | Feb 2, 2024 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases | Feb 1, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks | Jan 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild | Jan 8, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision | Dec 30, 2023 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| OAVA: the open audio-visual archives aggregator | Dec 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Revisiting the Entropy Semiring for Neural Speech Recognition | Dec 13, 2023 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Efficient Monotonic Multihead Attention | Dec 7, 2023 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| End-to-End Speech-to-Text Translation: A Survey | Dec 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Multi-teacher Distillation for Multilingual Spelling Correction | Nov 20, 2023 | Multilingual NLPSpeech-to-Text | —Unverified | 0 |
| COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning | Nov 3, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation | Oct 13, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Toward Joint Language Modeling for Speech Units and Text | Oct 12, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach | Oct 6, 2023 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer | Oct 5, 2023 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| AfriSpeech-200: Pan-African Accented Speech Dataset for Clinical and General Domain ASR | Sep 30, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |