| MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation | Feb 18, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| LLMPopcorn: An Empirical Study of LLMs as Assistants for Popular Micro-video Generation | Feb 18, 2025 | BenchmarkingText Generation | —Unverified | 0 |
| Object-Centric Image to Video Generation with Language Guidance | Feb 17, 2025 | Image to Video GenerationObject | CodeCode Available | 1 |
| DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation | Feb 17, 2025 | Video Generation | CodeCode Available | 1 |
| Phantom: Subject-consistent video generation via cross-modal alignment | Feb 16, 2025 | cross-modal alignmentHuman-Domain Subject-to-Video | CodeCode Available | 5 |
| MaskFlow: Discrete Flows For Flexible and Efficient Long Video Generation | Feb 16, 2025 | Video Generation | —Unverified | 0 |
| RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control | Feb 14, 2025 | 3D Scene ReconstructionDepth Estimation | —Unverified | 0 |
| Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Feb 14, 2025 | Video GenerationVideo Reconstruction | CodeCode Available | 7 |
| GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation | Feb 13, 2025 | Contrastive LearningVideo Generation | —Unverified | 0 |
| AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance | Feb 12, 2025 | Video Generation | —Unverified | 0 |
| FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis | Feb 12, 2025 | Motion SynthesisOptical Flow Estimation | —Unverified | 0 |
| CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation | Feb 12, 2025 | ObjectText-to-Video Generation | —Unverified | 0 |
| Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation | Feb 11, 2025 | Gesture GenerationVideo Generation | —Unverified | 0 |
| Enhance-A-Video: Better Generated Video for Free | Feb 11, 2025 | Video Generation | CodeCode Available | 4 |
| Generative Ghost: Investigating Ranking Bias Hidden in AI-Generated Videos | Feb 11, 2025 | Contrastive LearningImage Retrieval | —Unverified | 0 |
| Next Block Prediction: Video Generation via Semi-Auto-Regressive Modeling | Feb 11, 2025 | Video Generation | —Unverified | 0 |
| Magic 1-For-1: Generating One Minute Video Clips within One Minute | Feb 11, 2025 | Image GenerationImage to Video Generation | CodeCode Available | 0 |
| Articulate That Object Part (ATOP): 3D Part Articulation from Text and Motion Personalization | Feb 11, 2025 | Image GenerationMotion Generation | —Unverified | 0 |
| VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation | Feb 11, 2025 | Image to Video GenerationObject | —Unverified | 0 |
| History-Guided Video Diffusion | Feb 10, 2025 | Video Generation | CodeCode Available | 3 |
| Señorita-2M: A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists | Feb 10, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models | Feb 10, 2025 | 3D Generation3D Reconstruction | CodeCode Available | 5 |
| CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers | Feb 10, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile | Feb 10, 2025 | Video Generation | CodeCode Available | 7 |
| Conditional diffusion model with spatial attention and latent embedding for medical image segmentation | Feb 10, 2025 | HippocampusImage Segmentation | CodeCode Available | 1 |
| A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction | Feb 8, 2025 | Model OptimizationOptical Flow Estimation | CodeCode Available | 0 |
| HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation | Feb 7, 2025 | FormPose Transfer | —Unverified | 0 |
| FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation | Feb 7, 2025 | Computational EfficiencyText-to-Video Generation | CodeCode Available | 3 |
| Goku: Flow Based Video Generative Foundation Models | Feb 7, 2025 | Image GenerationText to Image Generation | CodeCode Available | 7 |
| Fast Video Generation with Sliding Tile Attention | Feb 6, 2025 | Video Generation | CodeCode Available | 7 |
| Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency | Feb 6, 2025 | Video GenerationVideo Quality Assessment | CodeCode Available | 1 |
| MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation | Feb 6, 2025 | Image to Video GenerationVideo Editing | —Unverified | 0 |
| UniCP: A Unified Caching and Pruning Framework for Efficient Video Generation | Feb 6, 2025 | Computational EfficiencyVideo Generation | —Unverified | 0 |
| Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression | Feb 6, 2025 | Computational EfficiencyVideo Generation | —Unverified | 0 |
| UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation | Feb 6, 2025 | Audio GenerationDiversity | —Unverified | 0 |
| Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach | Feb 5, 2025 | Video Generation | —Unverified | 0 |
| FreqPrior: Improving Video Diffusion Models with Frequency Filtering Gaussian Noise | Feb 5, 2025 | Video Generation | —Unverified | 0 |
| On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices | Feb 5, 2025 | DenoisingModel Optimization | CodeCode Available | 2 |
| MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent | Feb 5, 2025 | Image to Video GenerationMotion Generation | —Unverified | 0 |
| Harness Local Rewards for Global Benefits: Effective Text-to-Video Generation Alignment with Patch-level Reward Models | Feb 4, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models | Feb 4, 2025 | Motion Generationmotion prediction | —Unverified | 0 |
| IPO: Iterative Preference Optimization for Text-to-Video Generation | Feb 4, 2025 | Large Language ModelText-to-Video Generation | —Unverified | 0 |
| Improved Training Technique for Latent Consistency Models | Feb 3, 2025 | Video Generation | CodeCode Available | 1 |
| VidSketch: Hand-drawn Sketch-Driven Video Generation with Diffusion Control | Feb 3, 2025 | Video Generation | CodeCode Available | 1 |
| OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models | Feb 3, 2025 | Human AnimationHuman-Object Interaction Detection | —Unverified | 0 |
| Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity | Feb 3, 2025 | Video Generation | —Unverified | 0 |
| MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation | Feb 3, 2025 | BenchmarkingFairness | —Unverified | 0 |
| Secure & Personalized Music-to-Video Generation via CHARCHA | Feb 3, 2025 | RhythmVideo Generation | —Unverified | 0 |
| VILP: Imitation Learning with Latent Video Planning | Feb 3, 2025 | Imitation LearningVideo Generation | CodeCode Available | 1 |
| Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer | Feb 2, 2025 | Reinforcement Learning (RL)Video Generation | —Unverified | 0 |