| Multi-sentence Video Grounding for Long Video Generation | Jul 18, 2024 | Moment RetrievalRetrieval | —Unverified | 0 |
| Rethinking the Architecture Design for Efficient Generic Event Boundary Detection | Jul 17, 2024 | Boundary DetectionGeneric Event Boundary Detection | CodeCode Available | 1 |
| InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models | Jul 15, 2024 | ObjectVideo Editing | —Unverified | 0 |
| Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN | Jul 8, 2024 | DisentanglementVideo Editing | —Unverified | 0 |
| Transformer-based Image and Video Inpainting: Current Challenges and Future Directions | Jun 28, 2024 | Image InpaintingVideo Editing | —Unverified | 0 |
| Diffusion Model-Based Video Editing: A Survey | Jun 26, 2024 | modelSurvey | CodeCode Available | 3 |
| MVOC: a training-free multiple video object composition method with diffusion models | Jun 22, 2024 | Image to Video GenerationObject | CodeCode Available | 1 |
| A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models | Jun 20, 2024 | Video Editing | CodeCode Available | 3 |
| V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data | Jun 20, 2024 | AttributeVideo Editing | —Unverified | 0 |
| VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing | Jun 18, 2024 | Video Editing | —Unverified | 0 |