| HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment | Feb 2, 2025 | Video Generation | —Unverified | 0 |
| Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation | Feb 1, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| Shape from Semantics: 3D Shape Generation from Multi-View Semantics | Feb 1, 2025 | 3D geometry3D Shape Generation | —Unverified | 0 |
| Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search | Jan 31, 2025 | DenoisingVideo Alignment | CodeCode Available | 1 |
| Every Image Listens, Every Image Dances: Music-Driven Image Animation | Jan 30, 2025 | Image AnimationVideo Generation | —Unverified | 0 |
| CascadeV: An Implementation of Wurstchen Architecture for Video Generation | Jan 28, 2025 | 2kVideo Generation | CodeCode Available | 1 |
| VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking | Jan 24, 2025 | DenoisingImage Generation | CodeCode Available | 2 |
| EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion | Jan 23, 2025 | Video Generation | CodeCode Available | 1 |
| Improving Video Generation with Human Feedback | Jan 23, 2025 | Video Generation | —Unverified | 0 |
| Taming Teacher Forcing for Masked Autoregressive Video Generation | Jan 21, 2025 | Video GenerationVideo Prediction | —Unverified | 0 |
| Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Jan 21, 2025 | Computational EfficiencyDepth Estimation | CodeCode Available | 5 |
| CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Jan 20, 2025 | Video GenerationVirtual Try-on | CodeCode Available | 3 |
| GenVidBench: A Challenging Benchmark for Detecting AI-Generated Video | Jan 20, 2025 | Video ClassificationVideo Generation | —Unverified | 0 |
| EMO2: End-Effector Guided Audio-Driven Avatar Video Generation | Jan 18, 2025 | Gesture GenerationVideo Generation | —Unverified | 0 |
| RichSpace: Enriching Text-to-Video Prompt Space via Text Embedding Interpolation | Jan 17, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| DiffuEraser: A Diffusion Model for Video Inpainting | Jan 17, 2025 | modelOptical Flow Estimation | CodeCode Available | 4 |
| VideoWorld: Exploring Knowledge Learning from Unlabeled Videos | Jan 16, 2025 | Video Generation | —Unverified | 0 |
| Learnings from Scaling Visual Tokenizers for Reconstruction and Generation | Jan 16, 2025 | DecoderImage Generation | —Unverified | 0 |
| RepVideo: Rethinking Cross-Layer Representation for Video Generation | Jan 15, 2025 | Video Generation | —Unverified | 0 |
| Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion | Jan 15, 2025 | DenoisingVideo Denoising | —Unverified | 0 |
| Comprehensive Subjective and Objective Evaluation Method for Text-generated Video | Jan 15, 2025 | Video Generation | —Unverified | 0 |
| Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models | Jan 14, 2025 | BenchmarkingText-to-Video Generation | CodeCode Available | 4 |
| 3D Gaussian Splatting with Normal Information for Mesh Extraction and Improved Rendering | Jan 14, 2025 | Novel View SynthesisVideo Generation | —Unverified | 0 |
| Do generative video models understand physical principles? | Jan 14, 2025 | Video Generation | CodeCode Available | 3 |
| GameFactory: Creating New Games with Generative Interactive Videos | Jan 14, 2025 | Domain GeneralizationMinecraft | —Unverified | 0 |
| LayerAnimate: Layer-specific Control for Animation | Jan 14, 2025 | Video Generation | —Unverified | 0 |
| FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors | Jan 14, 2025 | Image to Video GenerationVideo Generation | CodeCode Available | 3 |
| Diffusion Adversarial Post-Training for One-Step Video Generation | Jan 14, 2025 | Video Generation | —Unverified | 0 |
| BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations | Jan 13, 2025 | ObjectText-to-Video Generation | —Unverified | 0 |
| Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss | Jan 13, 2025 | Feature CorrelationVideo Generation | —Unverified | 0 |
| HeteroLLM: Accelerating Large Language Model Inference on Mobile SoCs platform with Heterogeneous AI Accelerators | Jan 11, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Qffusion: Controllable Portrait Video Editing via Quadrant-Grid Attention Learning | Jan 11, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| MEt3R: Measuring Multi-View Consistency in Generated Images | Jan 10, 2025 | Image GenerationVideo Generation | —Unverified | 0 |
| VideoAuteur: Towards Long Narrative Video Generation | Jan 10, 2025 | Video Generation | —Unverified | 0 |
| Multi-subject Open-set Personalization in Video Generation | Jan 10, 2025 | Video Generation | —Unverified | 0 |
| Progressive Growing of Video Tokenizers for Highly Compressed Latent Spaces | Jan 9, 2025 | Video Generation | —Unverified | 0 |
| Tuning-Free Long Video Generation via Global-Local Collaborative Diffusion | Jan 8, 2025 | DenoisingDiversity | —Unverified | 0 |
| LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition | Jan 8, 2025 | Lip Readingspeech-recognition | —Unverified | 0 |
| ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning | Jan 8, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control | Jan 7, 2025 | Video Generation | CodeCode Available | 4 |
| Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers | Jan 7, 2025 | DiversityText-to-Video Generation | CodeCode Available | 2 |
| Motion-Aware Generative Frame Interpolation | Jan 7, 2025 | Video Generation | —Unverified | 0 |
| License Plate Images Generation with Diffusion Models | Jan 6, 2025 | License Plate RecognitionSynthetic Data Generation | —Unverified | 0 |
| Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation | Jan 6, 2025 | Image to Video GenerationObject | —Unverified | 0 |
| TransPixeler: Advancing Text-to-Video Generation with Transparency | Jan 6, 2025 | Text-to-Video GenerationVideo Generation | CodeCode Available | 4 |
| Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising | Jan 6, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking | Jan 5, 2025 | Novel View SynthesisPoint Tracking | —Unverified | 0 |
| A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | Jan 4, 2025 | FairnessHallucination | CodeCode Available | 4 |
| JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing | Jan 3, 2025 | 3D ReconstructionFace Generation | CodeCode Available | 3 |
| VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control | Jan 2, 2025 | Talking Head GenerationVideo Generation | —Unverified | 0 |