| Real-Time Person Image Synthesis Using a Flow Matching Model | May 6, 2025 | Image GenerationVideo Generation | CodeCode Available | 0 |
| DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization | May 4, 2025 | DenoisingText-to-Video Generation | —Unverified | 0 |
| PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | May 3, 2025 | Autonomous DrivingCamera Pose Estimation | —Unverified | 0 |
| VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding | May 2, 2025 | Anomaly DetectionCommon Sense Reasoning | CodeCode Available | 1 |
| FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis | May 2, 2025 | Video Generation | CodeCode Available | 1 |
| T2VPhysBench: A First-Principles Benchmark for Physical Consistency in Text-to-Video Generation | May 1, 2025 | counterfactualInstruction Following | —Unverified | 0 |
| HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation | Apr 30, 2025 | Depth EstimationScene Generation | CodeCode Available | 2 |
| Eye2Eye: A Simple Approach for Monocular-to-Stereo Video Synthesis | Apr 30, 2025 | Disparity EstimationTransparent objects | —Unverified | 0 |
| ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction | Apr 30, 2025 | Video Generation | —Unverified | 0 |
| Capturing Conditional Dependence via Auto-regressive Diffusion Models | Apr 30, 2025 | Video Generation | —Unverified | 0 |
| TesserAct: Learning 4D Embodied World Models | Apr 29, 2025 | Novel View SynthesisVideo Generation | —Unverified | 0 |
| DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer | Apr 28, 2025 | Video Generation | —Unverified | 0 |
| Stealing Creator's Workflow: A Creator-Inspired Agentic Framework with Iterative Feedback Loop for Improved Scientific Short-form Generation | Apr 26, 2025 | FormVideo Generation | —Unverified | 0 |
| We'll Fix it in Post: Improving Text-to-Video Generation with Neuro-Symbolic Feedback | Apr 24, 2025 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Subject-driven Video Generation via Disentangled Identity and Motion | Apr 23, 2025 | Subject-driven Video GenerationVideo Generation | —Unverified | 0 |
| ManipDreamer: Boosting Robotic Manipulation World Model with Action Tree and Visual Guidance | Apr 23, 2025 | Instruction FollowingSSIM | —Unverified | 0 |
| DiTPainter: Efficient Video Inpainting with Diffusion Transformers | Apr 22, 2025 | Video GenerationVideo Inpainting | —Unverified | 0 |
| Survey of Video Diffusion Models: Foundations, Implementations, and Applications | Apr 22, 2025 | Computational EfficiencyDenoising | CodeCode Available | 1 |
| Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning | Apr 22, 2025 | Large Language Modelreinforcement-learning | —Unverified | 0 |
| Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation | Apr 21, 2025 | Video Generation | CodeCode Available | 4 |
| Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform | Apr 21, 2025 | Boundary DetectionOptical Character Recognition (OCR) | —Unverified | 0 |
| DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation | Apr 21, 2025 | AttributeDenoising | —Unverified | 0 |
| Turbo2K: Towards Ultra-Efficient and High-Quality 2K Video Synthesis | Apr 20, 2025 | 2kKnowledge Distillation | —Unverified | 0 |
| SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation | Apr 19, 2025 | ERPVideo Generation | CodeCode Available | 2 |
| SkyReels-V2: Infinite-length Film Generative Model | Apr 17, 2025 | Large Language Modelmodel | CodeCode Available | 9 |
| The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation | Apr 16, 2025 | SentenceText-to-Video Generation | —Unverified | 0 |
| VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame Rate | Apr 16, 2025 | Video Generation | CodeCode Available | 1 |
| Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM | Apr 16, 2025 | Large Language ModelText-to-Video Generation | —Unverified | 0 |
| OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding | Apr 15, 2025 | Semantic SegmentationVideo Generation | —Unverified | 0 |
| VideoPanda: Video Panoramic Diffusion with Multi-view Attention | Apr 15, 2025 | Video Generation | —Unverified | 0 |
| InterAnimate: Taming Region-aware Diffusion Model for Realistic Human Interaction Animation | Apr 15, 2025 | DenoisingVideo Generation | —Unverified | 0 |
| Aligning Anime Video Generation with Human Feedback | Apr 14, 2025 | Video Generation | CodeCode Available | 7 |
| FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos | Apr 14, 2025 | Video Generation | —Unverified | 0 |
| H-MoRe: Learning Human-centric Motion Representation for Action Analysis | Apr 14, 2025 | Action AnalysisAction Recognition | CodeCode Available | 0 |
| H3AE: High Compression, High Speed, and High Quality AutoEncoder for Video Diffusion Models | Apr 14, 2025 | DenoisingText-to-Video Generation | —Unverified | 0 |
| CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models | Apr 13, 2025 | Video EditingVideo Generation | —Unverified | 0 |
| RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements | Apr 11, 2025 | Video Generation | CodeCode Available | 2 |
| EasyGenNet: An Efficient Framework for Audio-Driven Gesture Video Generation Based on Diffusion Model | Apr 11, 2025 | Gesture GenerationVideo Generation | —Unverified | 0 |
| Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model | Apr 11, 2025 | GPUVideo Generation | —Unverified | 0 |
| TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation | Apr 11, 2025 | DisentanglementVideo Generation | —Unverified | 0 |
| Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | Apr 11, 2025 | DenoisingObject | —Unverified | 0 |
| Diffusion Models for Robotic Manipulation: A Survey | Apr 11, 2025 | Data AugmentationImage Augmentation | —Unverified | 0 |
| Diffusion Transformers for Tabular Data Time Series Generation | Apr 10, 2025 | Tabular Data GenerationTime Series | CodeCode Available | 1 |
| Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos | Apr 10, 2025 | Question AnsweringVideo Generation | —Unverified | 0 |
| DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation | Apr 9, 2025 | Image GenerationText to Image Generation | CodeCode Available | 1 |
| CamContextI2V: Context-aware Controllable Video Generation | Apr 8, 2025 | DiversityScene Understanding | CodeCode Available | 1 |
| One-Minute Video Generation with Test-Time Training | Apr 7, 2025 | MambaVideo Generation | —Unverified | 0 |
| Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization | Apr 5, 2025 | 3D GenerationVideo Alignment | CodeCode Available | 3 |
| Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models | Apr 4, 2025 | DenoisingVideo Generation | CodeCode Available | 1 |
| Morpheus: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments | Apr 3, 2025 | Physical Commonsense ReasoningVideo Generation | —Unverified | 0 |