| GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation | Jun 19, 2024 | BenchmarkingImage Generation | CodeCode Available | 3 |
| ARDuP: Active Region Video Diffusion for Universal Policies | Jun 19, 2024 | Decision MakingSequential Decision Making | —Unverified | 0 |
| Splatter a Video: Video Gaussian Representation for Versatile Processing | Jun 19, 2024 | Depth EstimationDepth Prediction | —Unverified | 0 |
| Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion | Jun 17, 2024 | Video Generation | CodeCode Available | 0 |
| NLDF: Neural Light Dynamic Fields for Efficient 3D Talking Head Generation | Jun 17, 2024 | Knowledge DistillationNeRF | —Unverified | 0 |
| ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models | Jun 16, 2024 | Video Generation | CodeCode Available | 2 |
| VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs | Jun 14, 2024 | Anomaly DetectionBenchmarking | CodeCode Available | 1 |
| Training-free Camera Control for Video Generation | Jun 14, 2024 | Data AugmentationVideo Generation | —Unverified | 0 |
| OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Jun 13, 2024 | Video GenerationVideo Prediction | CodeCode Available | 3 |
| Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs | Jun 13, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 2 |
| Hierarchical Patch Diffusion Models for High-Resolution Video Generation | Jun 12, 2024 | Video Generation | —Unverified | 0 |
| TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation | Jun 12, 2024 | BenchmarkingImage Generation | CodeCode Available | 1 |
| DiTFastAttn: Attention Compression for Diffusion Transformer Models | Jun 12, 2024 | 2kImage Generation | —Unverified | 0 |
| Vivid-ZOO: Multi-View Video Generation with Diffusion Model | Jun 12, 2024 | Video Generation | —Unverified | 0 |
| HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness | Jun 11, 2024 | ObjectVideo Editing | —Unverified | 0 |
| AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation | Jun 11, 2024 | Audio GenerationVideo Generation | —Unverified | 0 |
| Visual Representation Learning with Stochastic Frame Prediction | Jun 11, 2024 | DecoderPose Tracking | —Unverified | 0 |
| 4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models | Jun 11, 2024 | Scene GenerationVideo Generation | —Unverified | 0 |
| Compositional Video Generation as Flow Equalization | Jun 10, 2024 | Video EditingVideo Generation | CodeCode Available | 2 |
| Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion | Jun 9, 2024 | Autonomous DrivingObject | CodeCode Available | 1 |
| MotionClone: Training-Free Motion Cloning for Controllable Video Generation | Jun 8, 2024 | DenoisingMotion Generation | CodeCode Available | 4 |
| CoNo: Consistency Noise Injection for Tuning-free Long Video Diffusion | Jun 7, 2024 | SchedulingVideo Generation | —Unverified | 0 |
| Zero-Shot Video Editing through Adaptive Sliding Score Distillation | Jun 7, 2024 | DenoisingText-to-Video Generation | —Unverified | 0 |
| ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Jun 6, 2024 | Video CaptioningVideo Generation | CodeCode Available | 5 |
| SF-V: Single Forward Video Generation Model | Jun 6, 2024 | Denoisingmodel | CodeCode Available | 2 |
| GenAI Arena: An Open Evaluation Platform for Generative Models | Jun 6, 2024 | Image GenerationInstruction Following | CodeCode Available | 2 |
| VideoTetris: Towards Compositional Text-to-Video Generation | Jun 6, 2024 | DenoisingText-to-Video Generation | CodeCode Available | 3 |
| VideoPhy: Evaluating Physical Commonsense for Video Generation | Jun 5, 2024 | Video Generation | —Unverified | 0 |
| Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control | Jun 5, 2024 | Image AnimationVideo Generation | —Unverified | 0 |
| CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation | Jun 4, 2024 | Image to Video GenerationVideo Generation | —Unverified | 0 |
| I4VGen: Image as Free Stepping Stone for Text-to-Video Generation | Jun 4, 2024 | DiversityImage Generation | —Unverified | 0 |
| ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation | Jun 4, 2024 | QuantizationVideo Generation | CodeCode Available | 1 |
| V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation | Jun 4, 2024 | Video Generation | —Unverified | 0 |
| UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Jun 3, 2024 | Image AnimationVideo Generation | CodeCode Available | 4 |
| Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation | Jun 3, 2024 | Autonomous DrivingVideo Generation | —Unverified | 0 |
| Learning Temporally Consistent Video Depth from Video Diffusion Priors | Jun 3, 2024 | Depth EstimationNovel View Synthesis | —Unverified | 0 |
| ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation | Jun 3, 2024 | GPUVideo Generation | CodeCode Available | 2 |
| EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data Sharing | Jun 2, 2024 | De-identificationPrivacy Preserving | CodeCode Available | 1 |
| 4Diffusion: Multi-view Video Diffusion Model for 4D Generation | May 31, 2024 | NeRFVideo Generation | —Unverified | 0 |
| DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | May 30, 2024 | DeepFake DetectionMamba | CodeCode Available | 2 |
| MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model | May 30, 2024 | Image AnimationVideo Generation | CodeCode Available | 4 |
| Promptus: Can Prompts Streaming Replace Video Streaming with Stable Diffusion | May 30, 2024 | Semantic CommunicationVideo Compression | CodeCode Available | 2 |
| MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion | May 30, 2024 | DenoisingGPU | CodeCode Available | 3 |
| Improving the Training of Rectified Flows | May 30, 2024 | Image GenerationKnowledge Distillation | CodeCode Available | 2 |
| EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture | May 29, 2024 | Image GenerationVideo Generation | CodeCode Available | 7 |
| T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback | May 29, 2024 | Video Generation | CodeCode Available | 3 |
| MAVIN: Multi-Action Video Generation with Diffusion Models via Transition Video Infilling | May 28, 2024 | Video Generation | CodeCode Available | 1 |
| MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation | May 28, 2024 | Video Generation | CodeCode Available | 0 |
| VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers | May 28, 2024 | DenoisingVideo Generation | —Unverified | 0 |
| EG4D: Explicit Generation of 4D Object without Score Distillation | May 28, 2024 | Dynamic ReconstructionVideo Generation | CodeCode Available | 1 |