| Phased Consistency Models | May 28, 2024 | Image GenerationVideo Generation | CodeCode Available | 4 |
| RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance | May 27, 2024 | Image GenerationVideo Generation | —Unverified | 0 |
| ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance | May 27, 2024 | Diffusion PersonalizationVideo Generation | CodeCode Available | 1 |
| Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control | May 27, 2024 | Scene GenerationVideo Generation | —Unverified | 0 |
| Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer | May 27, 2024 | Video Generation | —Unverified | 0 |
| Controllable Longer Image Animation with Diffusion Models | May 27, 2024 | Image Animationmotion prediction | —Unverified | 0 |
| Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation | May 27, 2024 | Video Generation | —Unverified | 0 |
| Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability | May 27, 2024 | Autonomous DrivingVideo Generation | CodeCode Available | 7 |
| Disentangling Foreground and Background Motion for Enhanced Realism in Human Video Generation | May 26, 2024 | Video Generation | —Unverified | 0 |
| Towards Multi-Task Multi-Modal Models: A Video Generative Perspective | May 26, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation | May 24, 2024 | Image GenerationMamba | —Unverified | 0 |
| A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence | May 24, 2024 | Text GenerationVideo Generation | CodeCode Available | 0 |
| PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control | May 23, 2024 | Video Generation | —Unverified | 0 |
| Fisher Flow Matching for Generative Modeling over Discrete Data | May 23, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes | May 23, 2024 | 3D GenerationAutonomous Driving | —Unverified | 0 |
| Video Diffusion Models are Training-free Motion Interpreter and Controller | May 23, 2024 | Video Generation | CodeCode Available | 2 |
| ReVideo: Remake a Video with Motion and Content Control | May 22, 2024 | Video EditingVideo Generation | —Unverified | 0 |
| MotionCraft: Physics-based Zero-Shot Video Generation | May 22, 2024 | Image GenerationMissing Elements | CodeCode Available | 1 |
| CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers | May 21, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control | May 21, 2024 | AttributeMotion Generation | —Unverified | 0 |
| OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models | May 21, 2024 | Video Generation | CodeCode Available | 1 |
| FIFO-Diffusion: Generating Infinite Videos from Text without Training | May 19, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| From Sora What We Can See: A Survey of Text-to-Video Generation | May 17, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 3 |
| Dance Any Beat: Blending Beats with Visuals in Dance Video Generation | May 15, 2024 | Image to Video GenerationOptical Flow Estimation | —Unverified | 0 |
| The Lost Melody: Empirical Observations on Text-to-Video Generation From A Storytelling Perspective | May 13, 2024 | Text-to-Video GenerationVideo Generation | —Unverified | 0 |
| OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation | May 10, 2024 | 3D ReconstructionImage to 3D | CodeCode Available | 1 |
| Reviewing Intelligent Cinematography: AI research for camera-based video production | May 8, 2024 | Camera Calibrationobject-detection | —Unverified | 0 |
| Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method | May 7, 2024 | Video Generation | —Unverified | 0 |
| TALC: Time-Aligned Captions for Multi-Scene Text-to-Video Generation | May 7, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 1 |
| Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation | May 7, 2024 | Face GenerationTalking Face Generation | —Unverified | 0 |
| Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models | May 7, 2024 | Video GenerationVideo Prediction | —Unverified | 0 |
| Video Diffusion Models: A Survey | May 6, 2024 | SurveyText-to-Video Generation | CodeCode Available | 2 |
| Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond | May 6, 2024 | Autonomous DrivingDecision Making | CodeCode Available | 4 |
| Matten: Video Generation with Mamba-Attention | May 5, 2024 | MambaVideo Generation | —Unverified | 0 |
| StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation | May 2, 2024 | motion predictionStory Generation | CodeCode Available | 9 |
| Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model | Apr 30, 2024 | DescriptiveGesture Generation | —Unverified | 0 |
| FlexiFilm: Long Video Generation with Flexible Conditions | Apr 29, 2024 | Image GenerationVideo Generation | CodeCode Available | 1 |
| Synthesizing Audio from Silent Video using Sequence to Sequence Modeling | Apr 25, 2024 | DecoderDiversity | CodeCode Available | 0 |
| TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models | Apr 25, 2024 | DenoisingImage to Video Generation | CodeCode Available | 2 |
| MotionMaster: Training-free Camera Motion Transfer For Video Generation | Apr 24, 2024 | DisentanglementMotion Disentanglement | —Unverified | 0 |
| ID-Animator: Zero-Shot Identity-Preserving Human Video Generation | Apr 23, 2024 | AttributeVideo Generation | CodeCode Available | 3 |
| TAVGBench: Benchmarking Text to Audible-Video Generation | Apr 22, 2024 | BenchmarkingContrastive Learning | CodeCode Available | 1 |
| Accelerating Image Generation with Sub-path Linear Approximation Model | Apr 22, 2024 | DenoisingGPU | —Unverified | 0 |
| Motion-aware Latent Diffusion Models for Video Frame Interpolation | Apr 21, 2024 | Motion EstimationVideo Frame Interpolation | —Unverified | 0 |
| Music Consistency Models | Apr 20, 2024 | Computational EfficiencyMusic Generation | —Unverified | 0 |
| PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation | Apr 19, 2024 | motion predictionObject | —Unverified | 0 |
| On the Content Bias in Fréchet Video Distance | Apr 18, 2024 | Video Generation | CodeCode Available | 3 |
| AniClipart: Clipart Animation with Text-to-Video Priors | Apr 18, 2024 | Image to Video GenerationText-to-Video Generation | —Unverified | 0 |
| SparseDM: Toward Sparse Efficient Diffusion Models | Apr 16, 2024 | GPUVideo Generation | —Unverified | 0 |
| Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model | Apr 15, 2024 | GPUImage Generation | —Unverified | 0 |