| Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation | Jun 24, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 |
| OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions | May 27, 2025 | Audio-Visual SynchronizationConversational Response Generation | —Unverified | 0 |
| CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization | May 6, 2025 | Active Speaker DetectionAudio-Visual Speech Recognition | CodeCode Available | 2 |
| DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation | Mar 28, 2025 | Audio GenerationAudio-Visual Synchronization | —Unverified | 0 |
| UniSync: A Unified Framework for Audio-Visual Synchronization | Mar 20, 2025 | Audio-Visual SynchronizationContrastive Learning | —Unverified | 0 |
| FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis | Mar 6, 2025 | Audio-Visual Synchronization | —Unverified | 0 |
| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis | Dec 19, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 7 |
| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 |
| Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis | Sep 10, 2024 | Audio SynthesisAudio-Visual Synchronization | —Unverified | 0 |