| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 |
| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | Dec 19, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 7 |
| CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization | May 6, 2025 | Active Speaker DetectionAudio-Visual Speech Recognition | CodeCode Available | 2 |
| Synchformer: Efficient Synchronization from Sparse Cues | Jan 29, 2024 | Audio-Visual Synchronization | CodeCode Available | 2 |
| Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection | Apr 30, 2024 | Audio-Visual SynchronizationDeepFake Detection | CodeCode Available | 1 |
| PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores | Apr 10, 2024 | Audio-Visual Synchronization | CodeCode Available | 1 |
| Target Active Speaker Detection with Audio-visual Cues | May 22, 2023 | Active Speaker DetectionAudio-Visual Synchronization | CodeCode Available | 1 |
| Multimodal Transformer Distillation for Audio-Visual Synchronization | Oct 27, 2022 | Audio-Visual Synchronization | CodeCode Available | 1 |
| Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors | Oct 13, 2022 | Audio-Visual Synchronization | CodeCode Available | 1 |
| VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices | Apr 5, 2022 | Audio-Visual SynchronizationMusic Source Separation | CodeCode Available | 1 |