SOTAVerified

Video Alignment

Papers

Showing 150 of 83 papers

TitleStatusHype
CogVideoX: Text-to-Video Diffusion Models with An Expert TransformerCode11
HunyuanVideo: A Systematic Framework For Large Video Generative ModelsCode11
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video GenerationCode5
MiraData: A Large-Scale Video Dataset with Long Durations and Structured CaptionsCode4
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized SoundsCode4
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance DesignCode3
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationCode3
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and CompatibilityCode3
Video4DGen: Enhancing Video and 4D Generation through Mutual OptimizationCode3
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion ModulationCode2
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion ModelsCode2
VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality AssessmentCode2
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AICode2
Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D SpaceCode1
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text InterpretationCode1
Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in ConversationsCode1
EvalCrafter: Benchmarking and Evaluating Large Video Generation ModelsCode1
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference DatasetCode1
Frame-wise Action Representations for Long Videos via Sequence Contrastive LearningCode1
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision TransformersCode1
Subjective-Aligned Dataset and Metric for Text-to-Video Quality AssessmentCode1
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam SearchCode1
Time-Contrastive Networks: Self-Supervised Learning from VideoCode1
Learning a Grammar Inducer from Massive Uncurated Instructional VideosCode1
Swap Attention in Spatiotemporal Diffusions for Text-to-Video GenerationCode1
VRMDiff: Text-Guided Video Referring Matting Generation of DiffusionCode1
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential VideosCode1
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text RetrievalCode1
A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step InferenceCode0
Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified ModelCode0
Adversarial Skill Networks: Unsupervised Robot Skill Learning from VideoCode0
Dynamic Temporal Alignment of Speech to LipsCode0
Learning from Video and Text via Large-Scale Discriminative ClusteringCode0
View-Invariant, Occlusion-Robust Probabilistic Embedding for Human PoseCode0
View-Invariant Probabilistic Embedding for Human PoseCode0
Aligning Step-by-Step Instructional Diagrams to Video DemonstrationsCode0
Deep Understanding of Sign Language for Sign to Subtitle AlignmentCode0
Listen Then See: Video Alignment with Speaker AttentionCode0
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio CuesCode0
Self-Supervised Contrastive Learning for Videos using Differentiable Local AlignmentCode0
Temporal Cycle-Consistency LearningCode0
LAMV: Learning to Align and Match Videos With Kernelized Temporal LayersCode0
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal VerificationCode0
Edit As You Wish: Video Caption Editing with Multi-grained User ControlCode0
VADER: Video Alignment Differencing and Retrieval0
A Comprehensive Review of Few-shot Action Recognition0
Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering0
AniClipart: Clipart Animation with Text-to-Video Priors0
Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment0
Audio-Sync Video Generation with Multi-Stream Temporal Control0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.