| Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions | Jul 27, 2024 | Computational EfficiencyVideo Generation | —Unverified | 0 |
| SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency | Jul 24, 2024 | NeRFNovel View Synthesis | —Unverified | 0 |
| HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation | Jul 24, 2024 | BenchmarkingHuman Animation | CodeCode Available | 3 |
| Fréchet Video Motion Distance: A Metric for Evaluating Motion Consistency in Videos | Jul 23, 2024 | Image GenerationPoint Tracking | CodeCode Available | 2 |
| MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence | Jul 23, 2024 | Video Generation | —Unverified | 0 |
| Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data | Jul 23, 2024 | Video Generation | —Unverified | 0 |
| Anchored Diffusion for Video Face Reenactment | Jul 21, 2024 | Face ReenactmentVideo Generation | —Unverified | 0 |
| T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation | Jul 19, 2024 | AttributeLanguage Modeling | CodeCode Available | 2 |
| Unlearning Concepts from Text-to-Video Diffusion Models | Jul 19, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Multi-sentence Video Grounding for Long Video Generation | Jul 18, 2024 | Moment RetrievalRetrieval | —Unverified | 0 |
| Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion | Jul 18, 2024 | ImputationVideo Generation | —Unverified | 0 |
| Towards Understanding Unsafe Video Generation | Jul 17, 2024 | Image GenerationVideo Generation | CodeCode Available | 0 |
| VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control | Jul 17, 2024 | Video Generation | —Unverified | 0 |
| A Survey of Defenses against AI-generated Visual Media: Detection, Disruption, and Authentication | Jul 15, 2024 | FairnessImage Generation | —Unverified | 0 |
| IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation | Jul 15, 2024 | DenoisingDepth Estimation | CodeCode Available | 2 |
| Learning Online Scale Transformation for Talking Head Video Generation | Jul 13, 2024 | Face ReenactmentVideo Generation | —Unverified | 0 |
| Bora: Biomedical Generalist Video Generation Model | Jul 12, 2024 | Cell TrackingData Augmentation | —Unverified | 0 |
| Inference Optimization of Foundation Models on AI Accelerators | Jul 12, 2024 | Inference OptimizationModel Compression | —Unverified | 0 |
| Still-Moving: Customized Video Generation without Customized Video Data | Jul 11, 2024 | Video Generation | —Unverified | 0 |
| A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights | Jul 11, 2024 | Motion GenerationSurvey | CodeCode Available | 3 |
| E2VIDiff: Perceptual Events-to-Video Reconstruction using Diffusion Priors | Jul 11, 2024 | Image GenerationVideo Generation | —Unverified | 0 |
| Video-to-Audio Generation with Hidden Alignment | Jul 10, 2024 | Audio GenerationData Augmentation | —Unverified | 0 |
| VEnhancer: Generative Space-Time Enhancement for Video Generation | Jul 10, 2024 | Data AugmentationSuper-Resolution | —Unverified | 0 |
| Mobius: A High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task | Jul 9, 2024 | GPUText-to-Video Generation | CodeCode Available | 0 |
| MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions | Jul 8, 2024 | Video AlignmentVideo Generation | CodeCode Available | 4 |
| VIMI: Grounding Video Generation through Multi-modal Instruction | Jul 8, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| The Tug-of-War Between Deepfake Generation and Detection | Jul 8, 2024 | Face SwappingMisinformation | —Unverified | 0 |
| T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models | Jul 8, 2024 | Video Generation | —Unverified | 0 |
| This&That: Language-Gesture Controlled Video Generation for Robot Planning | Jul 8, 2024 | Task PlanningVideo Generation | —Unverified | 0 |
| LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control | Jul 3, 2024 | Computational EfficiencyFace Reenactment | CodeCode Available | 11 |
| GVDIFF: Grounded Text-to-Video Generation with Diffusion Models | Jul 2, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation | Jul 2, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Evaluation of Text-to-Video Generation Models: A Dynamics Perspective | Jul 1, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix | Jun 29, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance | Jun 28, 2024 | Image GenerationVideo Generation | CodeCode Available | 1 |
| What Matters in Detecting AI-Generated Videos like Sora? | Jun 27, 2024 | Optical Flow EstimationVideo Generation | —Unverified | 0 |
| ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation | Jun 26, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers | Jun 25, 2024 | Image GenerationModel Compression | CodeCode Available | 2 |
| MotionBooth: Motion-Aware Customized Text-to-Video Generation | Jun 25, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| Text-Animator: Controllable Visual Text Video Generation | Jun 25, 2024 | Text GenerationVideo Generation | —Unverified | 0 |
| Video-Infinity: Distributed Long Video Generation | Jun 24, 2024 | GPUVideo Generation | —Unverified | 0 |
| FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models | Jun 24, 2024 | Video Generation | CodeCode Available | 2 |
| Dreamitate: Real-World Visuomotor Policy Learning via Video Generation | Jun 24, 2024 | Video Generation | —Unverified | 0 |
| MVOC: a training-free multiple video object composition method with diffusion models | Jun 22, 2024 | Image to Video GenerationObject | CodeCode Available | 1 |
| Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model | Jun 22, 2024 | AttributeImage to Video Generation | —Unverified | 0 |
| VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | Jun 21, 2024 | Video GenerationVideo Quality Assessment | —Unverified | 0 |
| Video Generation with Learned Action Prior | Jun 20, 2024 | Image GenerationImage Reconstruction | —Unverified | 0 |
| ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning | Jun 20, 2024 | GPUVideo Generation | CodeCode Available | 0 |
| Fantastic Copyrighted Beasts and How (Not) to Generate Them | Jun 20, 2024 | Image GenerationVideo Generation | CodeCode Available | 1 |
| SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset | Jun 20, 2024 | Safety AlignmentText-to-Video Generation | CodeCode Available | 1 |