| Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization | May 18, 2023 | Audio-Visual Speech RecognitionPrompt Engineering | CodeCode Available | 1 |
| Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition | May 16, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| Multi-Temporal Lip-Audio Memory for Visual Speech Recognition | May 8, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition | Apr 30, 2023 | Deep LearningFace Recognition | —Unverified | 0 |
| SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision | Mar 30, 2023 | Lip Readingspeech-recognition | —Unverified | 0 |
| Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels | Mar 25, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring | Mar 15, 2023 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge | Mar 11, 2023 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition | Mar 9, 2023 | Lip ReadingMachine Translation | CodeCode Available | 1 |
| MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation | Mar 1, 2023 | Audio-Visual Speech RecognitionRobust Speech Recognition | CodeCode Available | 2 |
| Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video | Feb 27, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Conformers are All You Need for Visual Speech Recognition | Feb 17, 2023 | AllLipreading | —Unverified | 0 |
| Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices | Feb 17, 2023 | Audio-Visual Speech RecognitionGesture Recognition | —Unverified | 0 |
| Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition | Feb 16, 2023 | Sentencespeech-recognition | —Unverified | 0 |
| AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations | Feb 10, 2023 | Audio-Visual Speech RecognitionSelf-Supervised Learning | —Unverified | 0 |
| A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset | Jan 21, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset | Jan 16, 2023 | Audio-Visual Speech RecognitionLip Reading | CodeCode Available | 1 |
| ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration | Jan 1, 2023 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement | Dec 21, 2022 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| Jointly Learning Visual and Auditory Speech Representations from Raw Data | Dec 12, 2022 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 1 |
| Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning | Dec 10, 2022 | Audio-Visual Speech Recognitionreinforcement-learning | —Unverified | 0 |
| VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning | Nov 21, 2022 | Audio-Visual Speech RecognitionLanguage Modelling | —Unverified | 0 |
| Streaming Audio-Visual Speech Recognition with Alignment Regularization | Nov 3, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Visual Speech Recognition in a Driver Assistance System | Aug 29, 2022 | Data AugmentationLipreading | —Unverified | 0 |
| Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition | Jul 13, 2022 | Audio-Visual Speech RecognitionDecoder | CodeCode Available | 1 |