| CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer | Aug 12, 2024 | Text-to-Video GenerationVideo Alignment | CodeCode Available | 11 | 5 |
| HunyuanVideo: A Systematic Framework For Large Video Generative Models | Dec 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 11 | 5 |
| HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | May 7, 2025 | Human-Domain Subject-to-VideoSingle-Domain Subject-to-Video | CodeCode Available | 5 | 5 |
| MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions | Jul 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 4 | 5 |
| FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds | Jul 1, 2024 | Audio GenerationVideo Alignment | CodeCode Available | 4 | 5 |
| T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design | Oct 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 3 | 5 |
| Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation | Sep 27, 2023 | GPUText-to-Video Generation | CodeCode Available | 3 | 5 |
| CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility | Mar 18, 2024 | Image InpaintingVideo Alignment | CodeCode Available | 3 | 5 |
| Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Apr 5, 2025 | 3D GenerationVideo Alignment | CodeCode Available | 3 | 5 |
| Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization and Temporal Motion Modulation | May 29, 2025 | Portrait AnimationVideo Alignment | CodeCode Available | 2 | 5 |
| Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models | Mar 30, 2023 | Video AlignmentVideo Editing | CodeCode Available | 2 | 5 |
| VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment | Aug 21, 2024 | Video AlignmentVideo Editing | CodeCode Available | 2 | 5 |
| AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI | Jan 3, 2024 | Video AlignmentVideo Generation | CodeCode Available | 2 | 5 |
| Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space | Jun 23, 2022 | Action Recognitionimage-classification | CodeCode Available | 1 | 5 |
| LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation | May 17, 2025 | BenchmarkingQuestion Answering | CodeCode Available | 1 | 5 |
| Mamba-Enhanced Text-Audio-Video Alignment Network for Emotion Recognition in Conversations | Sep 8, 2024 | Emotion RecognitionMamba | CodeCode Available | 1 | 5 |
| EvalCrafter: Benchmarking and Evaluating Large Video Generation Models | Oct 17, 2023 | BenchmarkingLanguage Modelling | CodeCode Available | 1 | 5 |
| SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset | Jun 20, 2024 | Safety AlignmentText-to-Video Generation | CodeCode Available | 1 | 5 |
| Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning | Mar 28, 2022 | Action ClassificationContrastive Learning | CodeCode Available | 1 | 5 |
| Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers | Jun 15, 2023 | Action ClassificationAction Recognition | CodeCode Available | 1 | 5 |
| Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment | Mar 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Jan 31, 2025 | DenoisingVideo Alignment | CodeCode Available | 1 | 5 |
| Time-Contrastive Networks: Self-Supervised Learning from Video | Apr 23, 2017 | Metric Learningreinforcement-learning | CodeCode Available | 1 | 5 |
| Learning a Grammar Inducer from Massive Uncurated Instructional Videos | Oct 22, 2022 | Language AcquisitionVideo Alignment | CodeCode Available | 1 | 5 |
| Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation | May 18, 2023 | Image GenerationText to Image Generation | CodeCode Available | 1 | 5 |
| VRMDiff: Text-Guided Video Referring Matting Generation of Diffusion | Mar 11, 2025 | Image MattingVideo Alignment | CodeCode Available | 1 | 5 |
| Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos | Mar 22, 2023 | Representation LearningSentence | CodeCode Available | 1 | 5 |
| DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval | Jun 10, 2025 | Image CaptioningRetrieval | CodeCode Available | 1 | 5 |
| A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference | Jun 26, 2023 | Video Alignment | CodeCode Available | 0 | 5 |
| Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model | Jul 31, 2024 | BenchmarkingLarge Language Model | CodeCode Available | 0 | 5 |
| Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video | Oct 21, 2019 | continuous-controlContinuous Control | CodeCode Available | 0 | 5 |
| Dynamic Temporal Alignment of Speech to Lips | Aug 19, 2018 | Constrained Lip-synchronizationVideo Alignment | CodeCode Available | 0 | 5 |
| Learning from Video and Text via Large-Scale Discriminative Clustering | Jul 27, 2017 | Action RecognitionClustering | CodeCode Available | 0 | 5 |
| View-Invariant, Occlusion-Robust Probabilistic Embedding for Human Pose | Oct 23, 2020 | 3D Pose EstimationAction Recognition | CodeCode Available | 0 | 5 |
| View-Invariant Probabilistic Embedding for Human Pose | Dec 2, 2019 | Action RecognitionPose Retrieval | CodeCode Available | 0 | 5 |
| Aligning Step-by-Step Instructional Diagrams to Video Demonstrations | Mar 24, 2023 | Contrastive LearningImage Retrieval | CodeCode Available | 0 | 5 |
| Deep Understanding of Sign Language for Sign to Subtitle Alignment | Mar 5, 2025 | TranslationVideo Alignment | CodeCode Available | 0 | 5 |
| Listen Then See: Video Alignment with Speaker Attention | Apr 21, 2024 | cross-modal alignmentQuestion Answering | CodeCode Available | 0 | 5 |
| Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues | Jan 1, 2025 | Action RecognitionScene Recognition | CodeCode Available | 0 | 5 |
| Self-Supervised Contrastive Learning for Videos using Differentiable Local Alignment | Sep 6, 2024 | Action RecognitionContrastive Learning | CodeCode Available | 0 | 5 |
| Temporal Cycle-Consistency Learning | Apr 16, 2019 | Anomaly DetectionRepresentation Learning | CodeCode Available | 0 | 5 |
| LAMV: Learning to Align and Match Videos With Kernelized Temporal Layers | Jun 1, 2018 | Copy DetectionRetrieval | CodeCode Available | 0 | 5 |
| Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification | Nov 22, 2024 | Autonomous DrivingText-to-Video Generation | CodeCode Available | 0 | 5 |
| Edit As You Wish: Video Caption Editing with Multi-grained User Control | May 15, 2023 | AttributePosition | CodeCode Available | 0 | 5 |
| VADER: Video Alignment Differencing and Retrieval | Mar 23, 2023 | MisinformationRetrieval | —Unverified | 0 | 0 |
| A Comprehensive Review of Few-shot Action Recognition | Jul 20, 2024 | Action RecognitionFew-Shot action recognition | —Unverified | 0 | 0 |
| Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering | Jul 3, 2024 | Contrastive LearningLanguage Modelling | —Unverified | 0 | 0 |
| AniClipart: Clipart Animation with Text-to-Video Priors | Apr 18, 2024 | Image to Video GenerationText-to-Video Generation | —Unverified | 0 | 0 |
| Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment | Jul 24, 2023 | RetrievalText to Video Retrieval | —Unverified | 0 | 0 |
| Audio-Sync Video Generation with Multi-Stream Temporal Control | Jun 9, 2025 | Audio-Visual SynchronizationVideo Alignment | —Unverified | 0 | 0 |