| Target Active Speaker Detection with Audio-visual Cues | May 22, 2023 | Active Speaker DetectionAudio-Visual Synchronization | CodeCode Available | 1 |
| Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation | Jun 24, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 |
| Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation | Mar 25, 2021 | Audio-Visual SynchronizationSpeech Separation | —Unverified | 0 |
| Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning | Aug 13, 2020 | Action RecognitionAudio-Visual Synchronization | —Unverified | 0 |
| SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory | Nov 2, 2022 | Audio-Visual SynchronizationFace Generation | —Unverified | 0 |
| Comparative Analysis of Deep-Fake Algorithms | Sep 6, 2023 | Audio-Visual SynchronizationDeepFake Detection | —Unverified | 0 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 |
| OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions | May 27, 2025 | Audio-Visual SynchronizationConversational Response Generation | —Unverified | 0 |
| On Attention Modules for Audio-Visual Synchronization | Dec 14, 2018 | Audio-Visual Synchronization | —Unverified | 0 |
| On the Audio-visual Synchronization for Lip-to-Speech Synthesis | Mar 1, 2023 | Audio-Visual SynchronizationLip to Speech Synthesis | —Unverified | 0 |