| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 | 5 |
| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | Dec 19, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 7 | 5 |
| Synchformer: Efficient Synchronization from Sparse Cues | Jan 29, 2024 | Audio-Visual Synchronization | CodeCode Available | 2 | 5 |
| CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization | May 6, 2025 | Active Speaker DetectionAudio-Visual Speech Recognition | CodeCode Available | 2 | 5 |
| VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices | Apr 5, 2022 | Audio-Visual SynchronizationMusic Source Separation | CodeCode Available | 1 | 5 |
| Explicit Correlation Learning for Generalizable Cross-Modal Deepfake Detection | Apr 30, 2024 | Audio-Visual SynchronizationDeepFake Detection | CodeCode Available | 1 | 5 |
| Multimodal Transformer Distillation for Audio-Visual Synchronization | Oct 27, 2022 | Audio-Visual Synchronization | CodeCode Available | 1 | 5 |
| Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet | Oct 5, 2021 | Audio-Visual Synchronization | CodeCode Available | 1 | 5 |
| PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores | Apr 10, 2024 | Audio-Visual Synchronization | CodeCode Available | 1 | 5 |
| Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors | Oct 13, 2022 | Audio-Visual Synchronization | CodeCode Available | 1 | 5 |
| Target Active Speaker Detection with Audio-visual Cues | May 22, 2023 | Active Speaker DetectionAudio-Visual Synchronization | CodeCode Available | 1 | 5 |
| Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation | Jun 24, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 | 0 |
| Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation | Mar 25, 2021 | Audio-Visual SynchronizationSpeech Separation | —Unverified | 0 | 0 |
| Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning | Aug 13, 2020 | Action RecognitionAudio-Visual Synchronization | —Unverified | 0 | 0 |
| SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory | Nov 2, 2022 | Audio-Visual SynchronizationFace Generation | —Unverified | 0 | 0 |
| Comparative Analysis of Deep-Fake Algorithms | Sep 6, 2023 | Audio-Visual SynchronizationDeepFake Detection | —Unverified | 0 | 0 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 | 0 |
| OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions | May 27, 2025 | Audio-Visual SynchronizationConversational Response Generation | —Unverified | 0 | 0 |
| On Attention Modules for Audio-Visual Synchronization | Dec 14, 2018 | Audio-Visual Synchronization | —Unverified | 0 | 0 |
| On the Audio-visual Synchronization for Lip-to-Speech Synthesis | Mar 1, 2023 | Audio-Visual SynchronizationLip to Speech Synthesis | —Unverified | 0 | 0 |
| A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation | Jul 24, 2024 | Audio-Visual Synchronization | —Unverified | 0 | 0 |
| Audio-driven Talking Face Generation with Stabilized Synchronization Loss | Jul 18, 2023 | Audio-Visual SynchronizationFace Generation | —Unverified | 0 | 0 |
| Realistic Speech-Driven Facial Animation with GANs | Jun 14, 2019 | Audio-Visual SynchronizationLip Reading | —Unverified | 0 | 0 |
| RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network | Jun 26, 2024 | Audio-Visual SynchronizationFace Generation | —Unverified | 0 | 0 |
| Rethinking Audio-visual Synchronization for Active Speaker Detection | Jun 21, 2022 | Active Speaker DetectionAudio-Visual Synchronization | —Unverified | 0 | 0 |
| UniSync: A Unified Framework for Audio-Visual Synchronization | Mar 20, 2025 | Audio-Visual SynchronizationContrastive Learning | —Unverified | 0 | 0 |
| DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation | Mar 28, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 | 0 |
| Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis | Sep 10, 2024 | Audio SynthesisAudio-Visual Synchronization | —Unverified | 0 | 0 |
| CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing | Jan 22, 2024 | AudioCapsAudio-Visual Synchronization | —Unverified | 0 | 0 |
| FaceDirector: Continuous Control of Facial Performance in Video | Dec 1, 2015 | Audio-Visual Synchronizationcontinuous-control | —Unverified | 0 | 0 |
| FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis | Mar 6, 2025 | Audio-Visual Synchronization | —Unverified | 0 | 0 |
| Identity-Preserving Realistic Talking Face Generation | May 25, 2020 | Audio-Visual SynchronizationFace Generation | —Unverified | 0 | 0 |