| Sound-Guided Semantic Video Generation | Apr 20, 2022 | Video EditingVideo Generation | —Unverified | 0 | 0 |
| Soundify: Matching Sound Effects to Video | Dec 17, 2021 | Audio GenerationImage Classification | —Unverified | 0 | 0 |
| Spatio-temporal Action Recognition: A Survey | Jan 27, 2019 | Action DetectionAction Localization | —Unverified | 0 | 0 |
| Speech Driven Video Editing via an Audio-Conditioned Diffusion Model | Jan 10, 2023 | DenoisingFace Model | —Unverified | 0 | 0 |
| Speech Prediction in Silent Videos using Variational Autoencoders | Nov 14, 2020 | PredictionVideo Editing | —Unverified | 0 | 0 |
| SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation | May 25, 2025 | Video EditingVideo Generation | —Unverified | 0 | 0 |
| SSDNeRF: Semantic Soft Decomposition of Neural Radiance Fields | Dec 7, 2022 | Video Editing | —Unverified | 0 | 0 |
| Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding | Jun 9, 2025 | Contrastive LearningVideo Editing | —Unverified | 0 | 0 |
| Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges | Dec 4, 2024 | Code GenerationImage Comprehension | —Unverified | 0 | 0 |
| Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets | Sep 2, 2024 | Video AlignmentVideo Editing | —Unverified | 0 | 0 |
| Task-agnostic Temporally Consistent Facial Video Editing | Jul 3, 2020 | 3D ReconstructionVideo Editing | —Unverified | 0 | 0 |
| TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs | May 26, 2025 | BenchmarkingLarge Language Model | —Unverified | 0 | 0 |
| Temporally Consistent Semantic Video Editing | Jun 21, 2022 | Image GenerationVideo Editing | —Unverified | 0 | 0 |
| Text-based Talking Video Editing with Cascaded Conditional Diffusion | Jul 20, 2024 | Video Editing | —Unverified | 0 | 0 |
| FacialFilmroll: High-resolution multi-shot video editing | Oct 5, 2021 | Face ModelVideo Editing | —Unverified | 0 | 0 |
| Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs | Jan 10, 2025 | Video Editing | —Unverified | 0 | 0 |
| Text-Video Multi-Grained Integration for Video Moment Montage | Dec 12, 2024 | SentenceVideo Editing | —Unverified | 0 | 0 |
| The ALOS Dataset for Advert Localization in Outdoor Scenes | Apr 16, 2019 | BIG-bench Machine LearningMarketing | —Unverified | 0 | 0 |
| The Curious Case of End Token: A Zero-Shot Disentangled Image Editing using CLIP | Jun 1, 2024 | AttributeVideo Editing | —Unverified | 0 | 0 |
| Towards Consistent Video Editing with Text-to-Image Diffusion Models | May 27, 2023 | One-Shot LearningVideo Editing | —Unverified | 0 | 0 |
| Towards Data-Driven Automatic Video Editing | Jul 17, 2019 | Imitation LearningVideo Editing | —Unverified | 0 | 0 |
| Training-Free Robust Interactive Video Object Segmentation | Jun 8, 2024 | Interactive Video Object SegmentationObject | —Unverified | 0 | 0 |
| Trajectory Attention for Fine-grained Video Motion Control | Nov 28, 2024 | Inductive BiasVideo Editing | —Unverified | 0 | 0 |
| Transformer-based Image and Video Inpainting: Current Challenges and Future Directions | Jun 28, 2024 | Image InpaintingVideo Editing | —Unverified | 0 | 0 |
| Understanding Attention Mechanism in Video Diffusion Models | Apr 16, 2025 | Video Editing | —Unverified | 0 | 0 |