| LoopAnimate: Loopable Salient Object Animation | Apr 14, 2024 | GPUObject | —Unverified | 0 |
| Action-conditioned video data improves predictability | Apr 8, 2024 | Video Generation | —Unverified | 0 |
| AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment | Apr 7, 2024 | Video EditingVideo Generation | —Unverified | 0 |
| MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators | Apr 7, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| CameraCtrl: Enabling Camera Control for Text-to-Video Generation | Apr 2, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 4 |
| Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model | Apr 2, 2024 | Video Generation | CodeCode Available | 2 |
| Evaluating Text-to-Visual Generation with Image-to-Text Generation | Apr 1, 2024 | Image to textQuestion Answering | CodeCode Available | 3 |
| Grid Diffusion Models for Text-to-Video Generation | Mar 30, 2024 | GPUImage Generation | —Unverified | 0 |
| Motion Inversion for Video Customization | Mar 29, 2024 | Video Generation | CodeCode Available | 2 |
| A Review of Multi-Modal Large Language and Vision Models | Mar 28, 2024 | Image CaptioningPrompt Engineering | —Unverified | 0 |
| Frame by Familiar Frame: Understanding Replication in Video Diffusion Models | Mar 28, 2024 | Image GenerationVideo Generation | —Unverified | 0 |
| Tutorial on Diffusion Models for Imaging and Vision | Mar 26, 2024 | Image GenerationText to Image Generation | —Unverified | 0 |
| Annotated Biomedical Video Generation using Denoising Diffusion Probabilistic Models and Flow Fields | Mar 26, 2024 | Cell SegmentationDenoising | CodeCode Available | 0 |
| TC4D: Trajectory-Conditioned Text-to-4D Generation | Mar 26, 2024 | Scene GenerationVideo Generation | —Unverified | 0 |
| TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models | Mar 25, 2024 | Image to Video GenerationRelational Reasoning | —Unverified | 0 |
| A Survey on Long Video Generation: Challenges, Methods, and Prospects | Mar 25, 2024 | SurveyVideo Generation | —Unverified | 0 |
| Opportunities and challenges in the application of large artificial intelligence models in radiology | Mar 24, 2024 | Video Generation | —Unverified | 0 |
| Adaptive Super Resolution For One-Shot Talking-Head Generation | Mar 23, 2024 | DecoderSuper-Resolution | CodeCode Available | 2 |
| Spectral Motion Alignment for Video Motion Transfer using Diffusion Models | Mar 22, 2024 | Computational EfficiencyVideo Generation | —Unverified | 0 |
| StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text | Mar 21, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 |
| Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance | Mar 21, 2024 | Animated GIF GenerationImage Animation | CodeCode Available | 7 |
| AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks | Mar 21, 2024 | Image to Video GenerationStyle Transfer | CodeCode Available | 4 |
| Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition | Mar 21, 2024 | Video Generation | —Unverified | 0 |
| Explorative Inbetweening of Time and Space | Mar 21, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| Enabling Visual Composition and Animation in Unsupervised Video Generation | Mar 21, 2024 | Video Generation | —Unverified | 0 |
| StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN | Mar 21, 2024 | Unconditional Video GenerationVideo Generation | CodeCode Available | 1 |
| Mora: Enabling Generalist Video Generation via A Multi-Agent Framework | Mar 20, 2024 | Image to Video GenerationText-to-Video Generation | CodeCode Available | 5 |
| VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis | Mar 20, 2024 | Generative Temporal NursingText-to-Video Generation | CodeCode Available | 1 |
| S2DM: Sector-Shaped Diffusion Models for Video Generation | Mar 20, 2024 | Image GenerationOptical Flow Estimation | —Unverified | 0 |
| AnimateDiff-Lightning: Cross-Model Diffusion Distillation | Mar 19, 2024 | modelVideo Generation | —Unverified | 0 |
| CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility | Mar 18, 2024 | Image InpaintingVideo Alignment | CodeCode Available | 3 |
| AICL: Action In-Context Learning for Video Diffusion Model | Mar 18, 2024 | Action GenerationIn-Context Learning | CodeCode Available | 1 |
| Endora: Video Generation Models as Endoscopy Simulators | Mar 17, 2024 | Data AugmentationVideo Generation | —Unverified | 0 |
| DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers | Mar 15, 2024 | Text GenerationVideo Generation | CodeCode Available | 7 |
| Animate Your Motion: Turning Still Images into Dynamic Videos | Mar 15, 2024 | SpecificityText-to-Video Generation | —Unverified | 0 |
| Video Editing via Factorized Diffusion Distillation | Mar 14, 2024 | Video EditingVideo Generation | —Unverified | 0 |
| Intention-driven Ego-to-Exo Video Generation | Mar 14, 2024 | Optical Flow EstimationStereo Matching | —Unverified | 0 |
| VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis | Mar 13, 2024 | Face DetectionVideo Editing | —Unverified | 0 |
| Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts | Mar 13, 2024 | Image AnimationImage to Video Generation | CodeCode Available | 4 |
| AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production | Mar 12, 2024 | Image GenerationRAG | —Unverified | 0 |
| SSM Meets Video Diffusion Models: Efficient Long-Term Video Generation with Structured State Spaces | Mar 12, 2024 | GPUImage Generation | CodeCode Available | 1 |
| DragAnything: Motion Control for Anything using Entity Representation | Mar 12, 2024 | ObjectVideo Generation | CodeCode Available | 7 |
| Video Generation with Consistency Tuning | Mar 11, 2024 | Video Generation | —Unverified | 0 |
| DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Mar 11, 2024 | Autonomous DrivingLanguage Modeling | CodeCode Available | 3 |
| WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs | Mar 10, 2024 | AI AgentVideo Generation | —Unverified | 0 |
| VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion Models | Mar 10, 2024 | Copy DetectionImage Generation | CodeCode Available | 2 |
| BlazeBVD: Make Scale-Time Equalization Great Again for Blind Video Deflickering | Mar 10, 2024 | Video GenerationVideo Temporal Consistency | —Unverified | 0 |
| FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing | Mar 10, 2024 | Image GenerationText-to-Video Editing | —Unverified | 0 |
| VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models | Mar 8, 2024 | Video Generation | CodeCode Available | 2 |
| Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation | Mar 8, 2024 | ArticlesHallucination | —Unverified | 0 |