| Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era | Sep 3, 2024 | Scene UnderstandingShadow Detection | CodeCode Available | 2 |
| Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets | Sep 2, 2024 | Video AlignmentVideo Editing | —Unverified | 0 |
| CSS-Segment: 2nd Place Report of LSVOS Challenge VOS Track | Aug 24, 2024 | Autonomous DrivingObject | —Unverified | 0 |
| VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment | Aug 21, 2024 | Video AlignmentVideo Editing | CodeCode Available | 2 |
| Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos | Aug 20, 2024 | Video Editing | —Unverified | 0 |
| Language-Driven Interactive Shadow Detection | Aug 16, 2024 | DescriptiveShadow Detection | CodeCode Available | 0 |
| DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency | Aug 14, 2024 | text-guided-image-editingVideo Editing | —Unverified | 0 |
| Segment Anything for Videos: A Systematic Survey | Jul 31, 2024 | Image SegmentationRobot Manipulation Generalization | CodeCode Available | 5 |
| Fine-gained Zero-shot Video Sampling | Jul 31, 2024 | Image GenerationVideo Editing | —Unverified | 0 |
| Text-based Talking Video Editing with Cascaded Conditional Diffusion | Jul 20, 2024 | Video Editing | —Unverified | 0 |
| Multi-sentence Video Grounding for Long Video Generation | Jul 18, 2024 | Moment RetrievalRetrieval | —Unverified | 0 |
| Rethinking the Architecture Design for Efficient Generic Event Boundary Detection | Jul 17, 2024 | Boundary DetectionGeneric Event Boundary Detection | CodeCode Available | 1 |
| InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models | Jul 15, 2024 | ObjectVideo Editing | —Unverified | 0 |
| Audio-driven High-resolution Seamless Talking Head Video Editing via StyleGAN | Jul 8, 2024 | DisentanglementVideo Editing | —Unverified | 0 |
| Transformer-based Image and Video Inpainting: Current Challenges and Future Directions | Jun 28, 2024 | Image InpaintingVideo Editing | —Unverified | 0 |
| Diffusion Model-Based Video Editing: A Survey | Jun 26, 2024 | modelSurvey | CodeCode Available | 3 |
| MVOC: a training-free multiple video object composition method with diffusion models | Jun 22, 2024 | Image to Video GenerationObject | CodeCode Available | 1 |
| A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models | Jun 20, 2024 | Video Editing | CodeCode Available | 3 |
| V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data | Jun 20, 2024 | AttributeVideo Editing | —Unverified | 0 |
| VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing | Jun 18, 2024 | Video Editing | —Unverified | 0 |
| VideoGUI: A Benchmark for GUI Automation from Instructional Videos | Jun 14, 2024 | Video Editing | —Unverified | 0 |
| COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing | Jun 13, 2024 | DenoisingGPU | CodeCode Available | 1 |
| Instruct 4D-to-4D: Editing 4D Scenes as Pseudo-3D Scenes Using 2D Diffusion | Jun 13, 2024 | Optical Flow EstimationVideo Editing | —Unverified | 0 |
| 2nd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation | Jun 12, 2024 | Instance SegmentationSemantic Segmentation | —Unverified | 0 |
| HOI-Swap: Swapping Objects in Videos with Hand-Object Interaction Awareness | Jun 11, 2024 | ObjectVideo Editing | —Unverified | 0 |