SOTAVerified

Audio-Visual Synchronization

Papers

Showing 132 of 32 papers

TitleStatusHype
MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal SamplingCode9
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio SynthesisCode7
Synchformer: Efficient Synchronization from Sparse CuesCode2
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative SynchronizationCode2
VocaLiST: An Audio-Visual Synchronisation Model for Lips and VoicesCode1
Explicit Correlation Learning for Generalizable Cross-Modal Deepfake DetectionCode1
Multimodal Transformer Distillation for Audio-Visual SynchronizationCode1
Neural Pitch-Shifting and Time-Stretching with Controllable LPCNetCode1
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion ScoresCode1
Sparse in Space and Time: Audio-visual Synchronisation with Trainable SelectorsCode1
Target Active Speaker Detection with Audio-visual CuesCode1
Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio Generation0
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation0
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning0
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory0
Comparative Analysis of Deep-Fake Algorithms0
Audio-Sync Video Generation with Multi-Stream Temporal Control0
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions0
On Attention Modules for Audio-Visual Synchronization0
On the Audio-visual Synchronization for Lip-to-Speech Synthesis0
A Comprehensive Review and Taxonomy of Audio-Visual Synchronization Techniques for Realistic Speech Animation0
Audio-driven Talking Face Generation with Stabilized Synchronization Loss0
Realistic Speech-Driven Facial Animation with GANs0
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network0
Rethinking Audio-visual Synchronization for Active Speaker Detection0
UniSync: A Unified Framework for Audio-Visual Synchronization0
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation0
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis0
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing0
FaceDirector: Continuous Control of Facial Performance in Video0
FREAK: Frequency-modulated High-fidelity and Real-time Audio-driven Talking Portrait Synthesis0
Identity-Preserving Realistic Talking Face Generation0
Show:102550

No leaderboard results yet.