| Show-o2: Improved Native Unified Multimodal Models | Jun 18, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| StableAnimator: High-Quality Identity-Preserving Human Image Animation | Nov 26, 2024 | DenoisingFace Reenactment | CodeCode Available | 5 |
| HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation | May 7, 2025 | Human-Domain Subject-to-VideoSingle-Domain Subject-to-Video | CodeCode Available | 5 |
| Tora: Trajectory-oriented Diffusion Transformer for Video Generation | Jul 31, 2024 | Video CompressionVideo Generation | CodeCode Available | 5 |
| Phantom: Subject-consistent video generation via cross-modal alignment | Feb 16, 2025 | cross-modal alignmentHuman-Domain Subject-to-Video | CodeCode Available | 5 |
| OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation | Jun 2, 2025 | Data AugmentationHuman Animation | CodeCode Available | 5 |
| DanceGRPO: Unleashing GRPO on Visual Generation | May 12, 2025 | Denoisingreinforcement-learning | CodeCode Available | 5 |
| GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control | Mar 5, 2025 | Novel View SynthesisVideo Generation | CodeCode Available | 5 |
| VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild | Nov 27, 2022 | Video EditingVideo Generation | CodeCode Available | 5 |
| StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text | Mar 21, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators | Apr 7, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models | Jan 14, 2025 | BenchmarkingText-to-Video Generation | CodeCode Available | 4 |
| MotionClone: Training-Free Motion Cloning for Controllable Video Generation | Jun 8, 2024 | DenoisingMotion Generation | CodeCode Available | 4 |
| Diffusion Models: A Comprehensive Survey of Methods and Applications | Sep 2, 2022 | Image GenerationImage Super-Resolution | CodeCode Available | 4 |
| UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Jun 3, 2024 | Image AnimationVideo Generation | CodeCode Available | 4 |
| Unified Reward Model for Multimodal Understanding and Generation | Mar 7, 2025 | Image Generationmodel | CodeCode Available | 4 |
| Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts | Mar 13, 2024 | Image AnimationImage to Video Generation | CodeCode Available | 4 |
| Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation | Dec 22, 2022 | Style TransferText-to-Video Generation | CodeCode Available | 4 |
| Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation | Apr 21, 2025 | Video Generation | CodeCode Available | 4 |
| MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions | Jul 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 4 |
| DiffuEraser: A Diffusion Model for Video Inpainting | Jan 17, 2025 | modelOptical Flow Estimation | CodeCode Available | 4 |
| TransPixeler: Advancing Text-to-Video Generation with Transparency | Jan 6, 2025 | Text-to-Video GenerationVideo Generation | CodeCode Available | 4 |
| VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation | Mar 15, 2023 | Code GenerationDenoising | CodeCode Available | 4 |
| FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance | Aug 15, 2024 | TARVideo Generation | CodeCode Available | 4 |
| Taming Rectified Flow for Inversion and Editing | Nov 7, 2024 | Image GenerationText-to-Image Generation | CodeCode Available | 4 |
| Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control | Jan 7, 2025 | Video Generation | CodeCode Available | 4 |
| A Survey on Video Diffusion Models | Oct 16, 2023 | Image GenerationSurvey | CodeCode Available | 4 |
| SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints | Dec 10, 2024 | 4D reconstructionVideo Generation | CodeCode Available | 4 |
| Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators | Mar 23, 2023 | Image GenerationText-to-Video Generation | CodeCode Available | 4 |
| Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | Feb 5, 2024 | Science Question AnsweringText-to-Video Generation | CodeCode Available | 4 |
| Enhance-A-Video: Better Generated Video for Free | Feb 11, 2025 | Video Generation | CodeCode Available | 4 |
| Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models | Feb 27, 2024 | MarketingVideo Generation | CodeCode Available | 4 |
| SpargeAttention: Accurate and Training-free Sparse Attention Accelerating Any Model Inference | Feb 25, 2025 | modelVideo Generation | CodeCode Available | 4 |
| SkyReels-A2: Compose Anything in Video Diffusion Transformers | Apr 3, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 4 |
| Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation | Oct 9, 2023 | Action RecognitionImage Generation | CodeCode Available | 4 |
| Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond | May 6, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 4 |
| Identity-Preserving Text-to-Video Generation by Frequency Decomposition | Nov 26, 2024 | Human-Domain Subject-to-VideoImage to Video Generation | CodeCode Available | 4 |
| Phased Consistency Models | May 28, 2024 | Image GenerationVideo Generation | CodeCode Available | 4 |
| DreamGen: Unlocking Generalization in Robot Learning through Video World Models | May 19, 2025 | Video Generation | CodeCode Available | 4 |
| AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks | Mar 21, 2024 | Image to Video GenerationStyle Transfer | CodeCode Available | 4 |
| HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models | Oct 30, 2024 | Video Generation | CodeCode Available | 4 |
| Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | Feb 29, 2024 | RetrievalText Retrieval | CodeCode Available | 4 |
| CameraCtrl: Enabling Camera Control for Text-to-Video Generation | Apr 2, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 4 |
| NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis | Jul 20, 2022 | Image OutpaintingText-to-Image Generation | CodeCode Available | 4 |
| OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation | May 26, 2025 | Human-Domain Subject-to-VideoOpen-Domain Subject-to-Video | CodeCode Available | 4 |
| Autoregressive Video Generation without Vector Quantization | Dec 18, 2024 | Image GenerationPrediction | CodeCode Available | 4 |
| Autoregressive Models in Vision: A Survey | Nov 8, 2024 | 3D GenerationImage Generation | CodeCode Available | 4 |
| MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model | May 30, 2024 | Image AnimationVideo Generation | CodeCode Available | 4 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 |
| AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data | Feb 1, 2024 | Conditional Image GenerationDenoising | CodeCode Available | 4 |