| HunyuanVideo: A Systematic Framework For Large Video Generative Models | Dec 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 11 |
| CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer | Aug 12, 2024 | Text-to-Video GenerationVideo Alignment | CodeCode Available | 11 |
| HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | May 7, 2025 | Human-Domain Subject-to-VideoSingle-Domain Subject-to-Video | CodeCode Available | 5 |
| MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions | Jul 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 4 |
| FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds | Jul 1, 2024 | Audio GenerationVideo Alignment | CodeCode Available | 4 |
| Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Apr 5, 2025 | 3D GenerationVideo Alignment | CodeCode Available | 3 |
| T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design | Oct 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 3 |
| CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility | Mar 18, 2024 | Image InpaintingVideo Alignment | CodeCode Available | 3 |
| Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation | Sep 27, 2023 | GPUText-to-Video Generation | CodeCode Available | 3 |
| Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation | May 29, 2025 | Portrait AnimationVideo Alignment | CodeCode Available | 2 |
| VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment | Aug 21, 2024 | Video AlignmentVideo Editing | CodeCode Available | 2 |
| AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI | Jan 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 2 |
| Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models | Mar 30, 2023 | Video AlignmentVideo Editing | CodeCode Available | 2 |
| DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval | Jun 10, 2025 | Image CaptioningRetrieval | CodeCode Available | 1 |
| LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | May 17, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 |
| VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion | Mar 11, 2025 | Image MattingVideo Alignment | CodeCode Available | 1 |
| Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Jan 31, 2025 | DenoisingVideo Alignment | CodeCode Available | 1 |
| Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations | Sep 8, 2024 | Emotion RecognitionMamba | CodeCode Available | 1 |
| SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset | Jun 20, 2024 | Safety AlignmentText-to-Video Generation | CodeCode Available | 1 |
| Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment | Mar 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| EvalCrafter: Benchmarking and Evaluating Large Video Generation Models | Oct 17, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 |
| Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers | Jun 15, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation | May 18, 2023 | Image GenerationText to Image Generation | CodeCode Available | 1 |
| Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos | Mar 22, 2023 | Representation LearningSentence | CodeCode Available | 1 |
| Learning a Grammar Inducer from Massive Uncurated Instructional Videos | Oct 22, 2022 | Language AcquisitionVideo Alignment | CodeCode Available | 1 |
| Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space | Jun 23, 2022 | Action Recognitionimage-classification | CodeCode Available | 1 |
| Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning | Mar 28, 2022 | Action ClassificationContrastive Learning | CodeCode Available | 1 |
| Time-Contrastive Networks: Self-Supervised Learning from Video | Apr 23, 2017 | Metric Learningreinforcement-learning | CodeCode Available | 1 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 |
| DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | May 11, 2025 | parameter-efficient fine-tuningVideo Alignment | —Unverified | 0 |
| DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation | Apr 21, 2025 | AttributeDenoising | —Unverified | 0 |
| Deep Understanding of Sign Language for Sign to Subtitle Alignment | Mar 5, 2025 | TranslationVideo Alignment | CodeCode Available | 0 |
| Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues | Jan 1, 2025 | Action RecognitionScene Recognition | CodeCode Available | 0 |
| Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance | Dec 24, 2024 | Audio GenerationVideo Alignment | —Unverified | 0 |
| Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback | Dec 3, 2024 | ObjectOffline RL | —Unverified | 0 |
| VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement | Nov 22, 2024 | Text-to-Video GenerationVideo Alignment | —Unverified | 0 |
| Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification | Nov 22, 2024 | Autonomous DrivingText-to-Video Generation | CodeCode Available | 0 |
| Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content | Oct 10, 2024 | Video AlignmentVideo Generation | —Unverified | 0 |
| Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment | Sep 22, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 |
| Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment | Sep 6, 2024 | Action RecognitionContrastive Learning | CodeCode Available | 0 |
| Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets | Sep 2, 2024 | Video AlignmentVideo Editing | —Unverified | 0 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 |
| A Comprehensive Review of Few-shot Action Recognition | Jul 20, 2024 | Action RecognitionFew-Shot action recognition | —Unverified | 0 |
| Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports | Jul 11, 2024 | Video Alignment | —Unverified | 0 |
| Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering | Jul 3, 2024 | Contrastive LearningLanguage Modelling | —Unverified | 0 |
| Listen Then See: Video Alignment with Speaker Attention | Apr 21, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 |
| AniClipart: Clipart Animation with Text-to-Video Priors | Apr 18, 2024 | Image to Video GenerationText-to-Video Generation | —Unverified | 0 |
| Scaling Up Video Summarization Pretraining with Large Language Models | Apr 4, 2024 | Video AlignmentVideo Summarization | —Unverified | 0 |
| The Effects of Short Video-Sharing Services on Video Copy Detection | Mar 26, 2024 | Copy DetectionVideo Alignment | —Unverified | 0 |
| FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing | Mar 10, 2024 | Image GenerationText-to-Video Editing | —Unverified | 0 |