| Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding | Jun 1, 2022 | Knowledge GraphsVideo Understanding | —Unverified | 0 | 0 |
| Discerning Generic Event Boundaries in Long-Form Wild Videos | Jun 18, 2021 | Boundary DetectionForm | —Unverified | 0 | 0 |
| Discrete neural representations for explainable anomaly detection | Dec 10, 2021 | Anomaly DetectionObject | —Unverified | 0 | 0 |
| Disentangle and denoise: Tackling context misalignment for video moment retrieval | Aug 14, 2024 | DenoisingDisentanglement | —Unverified | 0 | 0 |
| Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding | Oct 31, 2021 | Action RecognitionText Detection | —Unverified | 0 | 0 |
| DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning | Aug 29, 2024 | Multi-Task LearningPrompt Learning | —Unverified | 0 | 0 |
| DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition | Jan 11, 2019 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| DOAD: Decoupled One Stage Action Detection Network | Apr 1, 2023 | Action DetectionAction Recognition | —Unverified | 0 | 0 |
| DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering | Mar 20, 2025 | Contrastive LearningQuestion Answering | —Unverified | 0 | 0 |
| Domain Adaptation of VLM for Soccer Video Understanding | May 20, 2025 | Action ClassificationDomain Adaptation | —Unverified | 0 | 0 |
| DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation | Jul 31, 2023 | Action SegmentationHuman-Object Interaction Detection | —Unverified | 0 | 0 |
| Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning | Jan 1, 2024 | object-detectionObject Detection | —Unverified | 0 | 0 |
| DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model | Oct 2, 2023 | Autonomous DrivingLanguage Modeling | —Unverified | 0 | 0 |
| DrVideo: Document Retrieval Based Long Video Understanding | Jun 18, 2024 | document understandingEgoSchema | —Unverified | 0 | 0 |
| Dilated Temporal Relational Adversarial Network for Generic Video Summarization | Apr 30, 2018 | Generative Adversarial NetworkVideo Summarization | —Unverified | 0 | 0 |
| DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM | Oct 3, 2024 | Object TrackingVideo Understanding | —Unverified | 0 | 0 |
| DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation | Jun 5, 2025 | Motion CompensationOptical Flow Estimation | —Unverified | 0 | 0 |
| DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs | Apr 23, 2025 | Token ReductionVideo Understanding | —Unverified | 0 | 0 |
| Dynamic Appearance: A Video Representation for Action Recognition with Joint Training | Nov 23, 2022 | Action RecognitionTemporal Action Localization | —Unverified | 0 | 0 |
| Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition | Dec 13, 2018 | 3D Action RecognitionAction Recognition | —Unverified | 0 | 0 |
| Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering | Jul 1, 2022 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding | Nov 19, 2024 | Question AnsweringVideo Understanding | —Unverified | 0 | 0 |
| DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding | Jun 4, 2025 | MMEVideo MME | —Unverified | 0 | 0 |
| EAGLE: Egocentric AGgregated Language-video Engine | Sep 26, 2024 | Action RecognitionActivity Recognition | —Unverified | 0 | 0 |
| Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey | Jun 5, 2022 | 3D Hand Pose EstimationDomain Adaptation | —Unverified | 0 | 0 |
| Efficient Modelling Across Time of Human Actions and Interactions | Oct 5, 2021 | Action RecognitionVideo Understanding | —Unverified | 0 | 0 |
| Efficient Motion-Aware Video MLLM | Jan 1, 2025 | Question AnsweringVideo Question Answering | —Unverified | 0 | 0 |
| Efficient Video Understanding via Layered Multi Frame-Rate Analysis | Nov 24, 2018 | Autonomous DrivingVideo Understanding | —Unverified | 0 | 0 |
| EgoEnv: Human-centric environment representations from egocentric video | Jul 22, 2022 | Video Understanding | —Unverified | 0 | 0 |
| Egocentric Video Task Translation | Dec 13, 2022 | Multi-Task LearningTranslation | —Unverified | 0 | 0 |
| EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding | Jan 5, 2023 | Video Understanding | —Unverified | 0 | 0 |
| Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset | Oct 15, 2020 | Activity RecognitionEgocentric Activity Recognition | —Unverified | 0 | 0 |
| Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation | Jul 28, 2024 | Video Understanding | —Unverified | 0 | 0 |
| ElasticPlay: Interactive Video Summarization with Dynamic Time Budgets | Aug 23, 2017 | Video SummarizationVideo Understanding | —Unverified | 0 | 0 |
| Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding | Dec 31, 2024 | Robot ManipulationScene Understanding | —Unverified | 0 | 0 |
| EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Jul 14, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 | 0 |
| Empowering Agentic Video Analytics Systems with Video Language Models | May 1, 2025 | Knowledge GraphsRAG | —Unverified | 0 | 0 |
| End-to-end Generative Pretraining for Multimodal Video Captioning | Jan 20, 2022 | Action ClassificationDecoder | —Unverified | 0 | 0 |
| End-to-End Joint Semantic Segmentation of Actors and Actions in Video | Sep 1, 2018 | Action RecognitionSegmentation | —Unverified | 0 | 0 |
| End-to-End Video Classification with Knowledge Graphs | Nov 6, 2017 | BIG-bench Machine LearningClassification | —Unverified | 0 | 0 |
| Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning | Jan 1, 2024 | Transfer LearningVideo Understanding | —Unverified | 0 | 0 |
| Enhancing Long Video Understanding via Hierarchical Event-Based Memory | Sep 10, 2024 | Video Understanding | —Unverified | 0 | 0 |
| Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization | Oct 9, 2024 | Audio captioningLarge Language Model | —Unverified | 0 | 0 |
| Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training | Mar 18, 2021 | Video Understanding | —Unverified | 0 | 0 |
| Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis | Feb 11, 2025 | Action RecognitionVideo Description | —Unverified | 0 | 0 |
| Espresso: High Compression For Rich Extraction From Videos for Your Vision-Language Model | Dec 6, 2024 | EgoSchemaLanguage Modeling | —Unverified | 0 | 0 |
| EVA: An Embodied World Model for Future Video Anticipation | Oct 20, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment | Mar 12, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| EVQAScore: Efficient Video Question Answering Data Evaluation | Nov 11, 2024 | Keyword ExtractionQuestion Answering | —Unverified | 0 | 0 |
| Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding | Mar 12, 2025 | Instruction FollowingVideo Understanding | —Unverified | 0 | 0 |