SOTAVerified

Video Alignment

Papers

Showing 125 of 83 papers

TitleStatusHype
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
Audio-Sync Video Generation with Multi-Stream Temporal Control0
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion ModulationCode2
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text InterpretationCode1
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models0
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video GenerationCode5
DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation0
Video4DGen: Enhancing Video and 4D Generation through Mutual OptimizationCode3
VRMDiff: Text-Guided Video Referring Matting Generation of DiffusionCode1
Deep Understanding of Sign Language for Sign to Subtitle AlignmentCode0
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam SearchCode1
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio CuesCode0
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance0
HunyuanVideo: A Systematic Framework For Large Video Generative ModelsCode11
Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback0
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal VerificationCode0
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement0
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content0
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance DesignCode3
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment0
Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in ConversationsCode1
Self-Supervised Contrastive Learning for Videos using Differentiable Local AlignmentCode0
Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets0
VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality AssessmentCode2
CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerCode11
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.