| Face Consistency Benchmark for GenAI Video | May 16, 2025 | Video Generation | —Unverified | 0 |
| ToonifyGB: StyleGAN-based Gaussian Blendshapes for 3D Stylized Head Avatars | May 15, 2025 | Image StylizationVideo Generation | —Unverified | 0 |
| Generating time-consistent dynamics with discriminator-guided image diffusion models | May 14, 2025 | Video Generation | —Unverified | 0 |
| Aquarius: A Family of Industry-Level Video Generation Models for Marketing Scenarios | May 14, 2025 | MarketingVideo Generation | —Unverified | 0 |
| ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models | May 12, 2025 | Video Generation | —Unverified | 0 |
| Generative Pre-trained Autoregressive Diffusion Transformer | May 12, 2025 | Few-Shot LearningVideo Generation | —Unverified | 0 |
| DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models | May 11, 2025 | parameter-efficient fine-tuningVideo Alignment | —Unverified | 0 |
| BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation | May 11, 2025 | Video Generation | —Unverified | 0 |
| ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images | May 10, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| T2VTextBench: A Human Evaluation Benchmark for Textual Control in Video Generation Models | May 8, 2025 | Instruction FollowingText-to-Video Generation | —Unverified | 0 |
| Real-Time Person Image Synthesis Using a Flow Matching Model | May 6, 2025 | Image GenerationVideo Generation | CodeCode Available | 0 |
| Transformers for Learning on Noisy and Task-Level Manifolds: Approximation and Generalization Insights | May 6, 2025 | Video Generation | —Unverified | 0 |
| A Unit Enhancement and Guidance Framework for Audio-Driven Avatar Video Generation | May 6, 2025 | Human AnimationVideo Generation | —Unverified | 0 |
| DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization | May 4, 2025 | DenoisingText-to-Video Generation | —Unverified | 0 |
| PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | May 3, 2025 | Autonomous DrivingCamera Pose Estimation | —Unverified | 0 |
| T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation | May 1, 2025 | counterfactualInstruction Following | —Unverified | 0 |
| Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis | Apr 30, 2025 | Disparity EstimationTransparent objects | —Unverified | 0 |
| Capturing Conditional Dependence via Auto-regressive Diffusion Models | Apr 30, 2025 | Video Generation | —Unverified | 0 |
| ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction | Apr 30, 2025 | Video Generation | —Unverified | 0 |
| TesserAct: Learning 4D Embodied World Models | Apr 29, 2025 | Novel View SynthesisVideo Generation | —Unverified | 0 |
| DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer | Apr 28, 2025 | Video Generation | —Unverified | 0 |
| Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation | Apr 26, 2025 | FormVideo Generation | —Unverified | 0 |
| We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback | Apr 24, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Subject-driven Video Generation via Disentangled Identity and Motion | Apr 23, 2025 | Subject-driven Video GenerationVideo Generation | —Unverified | 0 |
| ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance | Apr 23, 2025 | Instruction FollowingSSIM | —Unverified | 0 |
| DiTPainter: Efficient Video Inpainting with Diffusion Transformers | Apr 22, 2025 | Video GenerationVideo Inpainting | —Unverified | 0 |
| Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning | Apr 22, 2025 | Large Language Modelreinforcement-learning | —Unverified | 0 |
| Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform | Apr 21, 2025 | Boundary DetectionOptical Character Recognition (OCR) | —Unverified | 0 |
| DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation | Apr 21, 2025 | AttributeDenoising | —Unverified | 0 |
| Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis | Apr 20, 2025 | 2kKnowledge Distillation | —Unverified | 0 |
| Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM | Apr 16, 2025 | Large Language ModelText-to-Video Generation | —Unverified | 0 |
| The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation | Apr 16, 2025 | SentenceText-to-Video Generation | —Unverified | 0 |
| OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding | Apr 15, 2025 | Semantic SegmentationVideo Generation | —Unverified | 0 |
| InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation | Apr 15, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| VideoPanda: Video Panoramic Diffusion with Multi-view Attention | Apr 15, 2025 | Video Generation | —Unverified | 0 |
| FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos | Apr 14, 2025 | Video Generation | —Unverified | 0 |
| H-MoRe: Learning Human-centric Motion Representation for Action Analysis | Apr 14, 2025 | Action AnalysisAction Recognition | CodeCode Available | 0 |
| H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models | Apr 14, 2025 | DenoisingText-to-Video Generation | —Unverified | 0 |
| CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models | Apr 13, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Apr 11, 2025 | DenoisingObject | —Unverified | 0 |
| TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation | Apr 11, 2025 | DisentanglementVideo Generation | —Unverified | 0 |
| EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model | Apr 11, 2025 | Gesture GenerationVideo Generation | —Unverified | 0 |
| Diffusion Models for Robotic Manipulation: A Survey | Apr 11, 2025 | Data AugmentationImage Augmentation | —Unverified | 0 |
| Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model | Apr 11, 2025 | GPUVideo Generation | —Unverified | 0 |
| Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Apr 10, 2025 | Question AnsweringVideo Generation | —Unverified | 0 |
| One-Minute Video Generation with Test-Time Training | Apr 7, 2025 | MambaVideo Generation | —Unverified | 0 |
| MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition | Apr 3, 2025 | Code GenerationImage to Video Generation | —Unverified | 0 |
| Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments | Apr 3, 2025 | Physical Commonsense ReasoningVideo Generation | —Unverified | 0 |
| ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer | Apr 3, 2025 | DisentanglementMotion Disentanglement | CodeCode Available | 0 |
| OmniCam: Unified Multimodal Video Generation via Camera Control | Apr 3, 2025 | Video Generation | —Unverified | 0 |