| Rich Semantic Knowledge Enhanced Large Language Models for Few-shot Chinese Spell Checking | Mar 13, 2024 | Chinese Spell CheckingIn-Context Learning | —Unverified | 0 |
| Robust Semantic Communications for Speech Transmission | Mar 8, 2024 | Generative Adversarial NetworkSemantic Communication | —Unverified | 0 |
| Brilla AI: AI Contestant for the National Science and Maths Quiz | Mar 4, 2024 | MathQuestion Answering | CodeCode Available | 1 |
| Compact Speech Translation Models via Discrete Speech Units Pretraining | Feb 29, 2024 | DecoderSelf-Supervised Learning | —Unverified | 0 |
| Direct Punjabi to English speech translation using discrete units | Feb 25, 2024 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| Hands-Free VR | Feb 23, 2024 | DiversityLanguage Modelling | —Unverified | 0 |
| OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification | Feb 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing? | Feb 19, 2024 | Speech-to-TextSpeech-to-Text Translation | —Unverified | 0 |
| Pushing the Limits of Zero-shot End-to-End Speech Translation | Feb 16, 2024 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 1 |
| Syllable based DNN-HMM Cantonese Speech to Text System | Feb 13, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Careless Whisper: Speech-to-Text Hallucination Harms | Feb 12, 2024 | HallucinationLanguage Modeling | CodeCode Available | 0 |
| Named Entity Recognition for Address Extraction in Speech-to-Text Transcriptions Using Synthetic Data | Feb 8, 2024 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| Digits micro-model for accurate and secure transactions | Feb 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Case Study on Filtering for End-to-End Speech Translation | Feb 2, 2024 | Speech-to-Speech TranslationSpeech-to-Text | —Unverified | 0 |
| Streaming Sequence Transduction through Dynamic Compression | Feb 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases | Feb 1, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 |
| Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks | Jan 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild | Jan 8, 2024 | Language ModellingLarge Language Model | CodeCode Available | 0 |
| Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision | Dec 30, 2023 | Speech-to-TextSpeech-to-Text Translation | CodeCode Available | 0 |
| OAVA: the open audio-visual archives aggregator | Dec 16, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Revisiting the Entropy Semiring for Neural Speech Recognition | Dec 13, 2023 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Efficient Monotonic Multihead Attention | Dec 7, 2023 | Simultaneous Speech-to-Text TranslationSpeech-to-Text | —Unverified | 0 |
| End-to-End Speech-to-Text Translation: A Survey | Dec 2, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Multi-teacher Distillation for Multilingual Spelling Correction | Nov 20, 2023 | Multilingual NLPSpeech-to-Text | —Unverified | 0 |