SOTAVerified

Video Alignment

Papers

Showing 150 of 83 papers

TitleStatusHype
HunyuanVideo: A Systematic Framework For Large Video Generative ModelsCode11
CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerCode11
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video GenerationCode5
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized SoundsCode4
MiraData: A Large-Scale Video Dataset with Long Durations and Structured CaptionsCode4
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and CompatibilityCode3
Video4DGen: Enhancing Video and 4D Generation through Mutual OptimizationCode3
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance DesignCode3
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationCode3
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion ModulationCode2
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion ModelsCode2
VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality AssessmentCode2
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AICode2
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential VideosCode1
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam SearchCode1
Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D SpaceCode1
VRMDiff: Text-Guided Video Referring Matting Generation of DiffusionCode1
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision TransformersCode1
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text InterpretationCode1
Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in ConversationsCode1
Learning a Grammar Inducer from Massive Uncurated Instructional VideosCode1
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference DatasetCode1
EvalCrafter: Benchmarking and Evaluating Large Video Generation ModelsCode1
Swap Attention in Spatiotemporal Diffusions for Text-to-Video GenerationCode1
Subjective-Aligned Dataset and Metric for Text-to-Video Quality AssessmentCode1
Frame-wise Action Representations for Long Videos via Sequence Contrastive LearningCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
Time-Contrastive Networks: Self-Supervised Learning from VideoCode1
VADER: Video Alignment Differencing and Retrieval0
A Comprehensive Review of Few-shot Action Recognition0
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering0
AniClipart: Clipart Animation with Text-to-Video Priors0
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment0
Audio-Sync Video Generation with Multi-Stream Temporal Control0
Book2Movie: Aligning Video Scenes With Book Chapters0
ContentCTR: Frame-level Live Streaming Click-Through Rate Prediction with Multimodal Transformer0
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models0
DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation0
FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing0
Frequency-aware Event-based Video Deblurring for Real-World Motion Blur0
Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback0
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content0
Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion0
Learning by Aligning Videos in Time0
Learning Robust Video Synchronization without Annotations0
Learning to Align Images using Weak Geometric Supervision0
Learning to Ground Instructional Articles in Videos through Narrations0
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment0
Learning to Predict Activity Progress by Self-Supervised Video Alignment0
STELLA: Continual Audio-Video Pre-training with Spatio-Temporal Localized Alignment0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.