SOTAVerified

Audio-Visual Synchronization

Papers

Showing 125 of 32 papers

TitleStatusHype
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation0
Audio-Sync Video Generation with Multi-Stream Temporal Control0
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions0
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative SynchronizationCode2
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation0
UniSync: A Unified Framework for Audio-Visual Synchronization0
FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis0
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio SynthesisCode7
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal SamplingCode9
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis0
A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation0
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network0
Explicit Correlation Learning for Generalizable Cross-Modal Deepfake DetectionCode1
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion ScoresCode1
Synchformer: Efficient Synchronization from Sparse CuesCode2
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
Comparative Analysis of Deep-Fake Algorithms0
Audio-driven Talking Face Generation with Stabilized Synchronization Loss0
Target Active Speaker Detection with Audio-visual CuesCode1
On the Audio-visual Synchronization for Lip-to-Speech Synthesis0
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory0
Multimodal Transformer Distillation for Audio-Visual SynchronizationCode1
Sparse in Space and Time: Audio-visual Synchronisation with Trainable SelectorsCode1
Rethinking Audio-visual Synchronization for Active Speaker Detection0
VocaLiST: An Audio-Visual Synchronisation Model for Lips and VoicesCode1
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.