| What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation? | Mar 20, 2025 | DecoderGraph Generation | —Unverified | 0 | 0 |
| What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets | Jun 1, 2018 | Video Understanding | —Unverified | 0 | 0 |
| When Work Matters: Transforming Classical Network Structures to Graph CNN | Jul 7, 2018 | Graph ClassificationVideo Understanding | —Unverified | 0 | 0 |
| WildQA: In-the-Wild Video Question Answering | Sep 14, 2022 | Evidence SelectionQuestion Answering | —Unverified | 0 | 0 |
| Wolf: Captioning Everything with a World Summarization Framework | Jul 26, 2024 | Autonomous DrivingMixture-of-Experts | —Unverified | 0 | 0 |
| WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning | May 6, 2024 | Multiple-choiceVideo Understanding | —Unverified | 0 | 0 |
| WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs | Feb 6, 2025 | Video Understanding | —Unverified | 0 | 0 |
| X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding | Jan 12, 2025 | Video Understanding | —Unverified | 0 | 0 |
| YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset | Jan 1, 2022 | ManagementSegmentation | —Unverified | 0 | 0 |
| YouTube-8M Video Understanding Challenge Approach and Applications | Jun 26, 2017 | Ensemble LearningVideo Understanding | —Unverified | 0 | 0 |
| ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection | Nov 1, 2023 | Action DetectionClassification | —Unverified | 0 | 0 |
| Zero-shot Action Localization via the Confidence of Large Vision-Language Models | Oct 18, 2024 | Action LocalizationLanguage Modelling | —Unverified | 0 | 0 |
| Zero-Shot Action Recognition in Surveillance Videos | Oct 28, 2024 | Action RecognitionVideo Understanding | —Unverified | 0 | 0 |
| Zero-Shot Action Recognition in Videos: A Survey | Sep 13, 2019 | Action RecognitionAction Recognition In Still Images | —Unverified | 0 | 0 |
| Zero-Shot Long-Form Video Understanding through Screenplay | Jun 25, 2024 | FormQuestion Answering | —Unverified | 0 | 0 |
| Zero-shot Shark Tracking and Biometrics from Aerial Imagery | Jan 10, 2025 | Video Understanding | —Unverified | 0 | 0 |
| Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network | Jun 2, 2019 | General ClassificationGraph Neural Network | —Unverified | 0 | 0 |
| Zero-Shot Video Question Answering with Procedural Programs | Dec 1, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 | 0 |
| 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation | Jun 8, 2024 | BenchmarkingInstance Segmentation | —Unverified | 0 | 0 |
| Multimodal Fusion and Coherence Modeling for Video Topic Segmentation | Aug 1, 2024 | Contrastive LearningMixture-of-Experts | —Unverified | 0 | 0 |
| FE-Adapter: Adapting Image-based Emotion Classifiers to Videos | Aug 5, 2024 | Dynamic Facial Expression RecognitionEmotion Recognition | —Unverified | 0 | 0 |
| An Analysis of Data Transformation Effects on Segment Anything 2 | Feb 25, 2025 | Semantic SegmentationVideo Object Segmentation | —Unverified | 0 | 0 |
| PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos | Feb 28, 2025 | Question AnsweringVideo Understanding | —Unverified | 0 | 0 |
| 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation | Jun 1, 2024 | Autonomous DrivingPanoptic Segmentation | —Unverified | 0 | 0 |
| 3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark | Dec 10, 2024 | Autonomous NavigationSpatial Reasoning | —Unverified | 0 | 0 |
| 3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation | Jun 6, 2024 | Panoptic SegmentationSegmentation | —Unverified | 0 | 0 |
| A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives | Mar 5, 2024 | Video Understanding | —Unverified | 0 | 0 |
| Abductive Ego-View Accident Video Understanding for Safe Driving Perception | Mar 1, 2024 | Objectobject-detection | —Unverified | 0 | 0 |
| ActAR: Actor-Driven Pose Embeddings for Video Action Recognition | Apr 19, 2022 | Action RecognitionOptical Flow Estimation | —Unverified | 0 | 0 |
| Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions | Mar 11, 2024 | counterfactualVideo Editing | —Unverified | 0 | 0 |
| Action Sensitivity Learning for Temporal Action Localization | May 25, 2023 | Action LocalizationMoment Queries | —Unverified | 0 | 0 |
| Action Understanding with Multiple Classes of Actors | Apr 27, 2017 | Action RecognitionAction Segmentation | —Unverified | 0 | 0 |
| Actor-Action Semantic Segmentation with Grouping Process Models | Dec 30, 2015 | Semantic SegmentationVideo Understanding | —Unverified | 0 | 0 |
| AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Nov 19, 2024 | GPUQuestion Answering | —Unverified | 0 | 0 |
| AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction | Jan 1, 2025 | GPUQuestion Answering | —Unverified | 0 | 0 |
| AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization | Nov 27, 2019 | Action ClassificationAction Recognition | —Unverified | 0 | 0 |
| Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception | Jan 1, 2025 | Autonomous DrivingGesture Recognition | —Unverified | 0 | 0 |
| Adaptive Intermediate Representations for Video Understanding | Apr 14, 2021 | Action ClassificationOptical Flow Estimation | —Unverified | 0 | 0 |
| Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning | Oct 26, 2024 | Video Understanding | —Unverified | 0 | 0 |
| AdaTP: Attention-Debiased Token Pruning for Video Large Language Models | May 26, 2025 | Video Understanding | —Unverified | 0 | 0 |
| A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions | Feb 5, 2025 | Action Quality AssessmentSurvey | —Unverified | 0 | 0 |
| Adversarial Machine Learning Attacks Against Video Anomaly Detection Systems | Apr 7, 2022 | Anomaly DetectionBIG-bench Machine Learning | —Unverified | 0 | 0 |
| Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter | Jul 29, 2024 | Action RecognitionAdversarial Robustness | —Unverified | 0 | 0 |
| AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding | Jul 21, 2022 | Action RecognitionVideo Understanding | —Unverified | 0 | 0 |
| AFO-TAD: Anchor-free One-Stage Detector for Temporal Action Detection | Oct 18, 2019 | Action Detectionobject-detection | —Unverified | 0 | 0 |
| Aggregating Frame-level Features for Large-Scale Video Classification | Jul 4, 2017 | ClassificationGeneral Classification | —Unverified | 0 | 0 |
| AirLetters: An Open Video Dataset of Characters Drawn in the Air | Oct 3, 2024 | Video Understanding | —Unverified | 0 | 0 |
| Aligned Better, Listen Better for Audio-Visual Large Language Models | Apr 2, 2025 | Video Understanding | —Unverified | 0 | 0 |
| ALLVB: All-in-One Long Video Understanding Benchmark | Mar 10, 2025 | AllVideo Understanding | —Unverified | 0 | 0 |
| AMEGO: Active Memory from long EGOcentric videos | Sep 17, 2024 | Video Understanding | —Unverified | 0 | 0 |