Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 1149 papers

Title	Date	Tasks	Status
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?	Mar 20, 2025	DecoderGraph Generation	—Unverified
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets	Jun 1, 2018	Video Understanding	—Unverified
When Work Matters: Transforming Classical Network Structures to Graph CNN	Jul 7, 2018	Graph ClassificationVideo Understanding	—Unverified
WildQA: In-the-Wild Video Question Answering	Sep 14, 2022	Evidence SelectionQuestion Answering	—Unverified
Wolf: Captioning Everything with a World Summarization Framework	Jul 26, 2024	Autonomous DrivingMixture-of-Experts	—Unverified
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning	May 6, 2024	Multiple-choiceVideo Understanding	—Unverified
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs	Feb 6, 2025	Video Understanding	—Unverified
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding	Jan 12, 2025	Video Understanding	—Unverified
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset	Jan 1, 2022	ManagementSegmentation	—Unverified
YouTube-8M Video Understanding Challenge Approach and Applications	Jun 26, 2017	Ensemble LearningVideo Understanding	—Unverified
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection	Nov 1, 2023	Action DetectionClassification	—Unverified
Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Oct 18, 2024	Action LocalizationLanguage Modelling	—Unverified
Zero-Shot Action Recognition in Surveillance Videos	Oct 28, 2024	Action RecognitionVideo Understanding	—Unverified
Zero-Shot Action Recognition in Videos: A Survey	Sep 13, 2019	Action RecognitionAction Recognition In Still Images	—Unverified
Zero-Shot Long-Form Video Understanding through Screenplay	Jun 25, 2024	FormQuestion Answering	—Unverified
Zero-shot Shark Tracking and Biometrics from Aerial Imagery	Jan 10, 2025	Video Understanding	—Unverified
Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network	Jun 2, 2019	General ClassificationGraph Neural Network	—Unverified
Zero-Shot Video Question Answering with Procedural Programs	Dec 1, 2023	Code GenerationLanguage Modeling	—Unverified
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation	Jun 8, 2024	BenchmarkingInstance Segmentation	—Unverified
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation	Aug 1, 2024	Contrastive LearningMixture-of-Experts	—Unverified
FE-Adapter: Adapting Image-based Emotion Classifiers to Videos	Aug 5, 2024	Dynamic Facial Expression RecognitionEmotion Recognition	—Unverified
An Analysis of Data Transformation Effects on Segment Anything 2	Feb 25, 2025	Semantic SegmentationVideo Object Segmentation	—Unverified
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos	Feb 28, 2025	Question AnsweringVideo Understanding	—Unverified
2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 1, 2024	Autonomous DrivingPanoptic Segmentation	—Unverified
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark	Dec 10, 2024	Autonomous NavigationSpatial Reasoning	—Unverified
3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 6, 2024	Panoptic SegmentationSegmentation	—Unverified
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives	Mar 5, 2024	Video Understanding	—Unverified
Abductive Ego-View Accident Video Understanding for Safe Driving Perception	Mar 1, 2024	Objectobject-detection	—Unverified
ActAR: Actor-Driven Pose Embeddings for Video Action Recognition	Apr 19, 2022	Action RecognitionOptical Flow Estimation	—Unverified
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions	Mar 11, 2024	counterfactualVideo Editing	—Unverified
Action Sensitivity Learning for Temporal Action Localization	May 25, 2023	Action LocalizationMoment Queries	—Unverified
Action Understanding with Multiple Classes of Actors	Apr 27, 2017	Action RecognitionAction Segmentation	—Unverified
Actor-Action Semantic Segmentation with Grouping Process Models	Dec 30, 2015	Semantic SegmentationVideo Understanding	—Unverified
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Nov 19, 2024	GPUQuestion Answering	—Unverified
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Jan 1, 2025	GPUQuestion Answering	—Unverified
AdapNet: Adaptability Decomposing Encoder-Decoder Network for Weakly Supervised Action Recognition and Localization	Nov 27, 2019	Action ClassificationAction Recognition	—Unverified
Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception	Jan 1, 2025	Autonomous DrivingGesture Recognition	—Unverified
Adaptive Intermediate Representations for Video Understanding	Apr 14, 2021	Action ClassificationOptical Flow Estimation	—Unverified
Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning	Oct 26, 2024	Video Understanding	—Unverified
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models	May 26, 2025	Video Understanding	—Unverified
A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions	Feb 5, 2025	Action Quality AssessmentSurvey	—Unverified
Adversarial Machine Learning Attacks Against Video Anomaly Detection Systems	Apr 7, 2022	Anomaly DetectionBIG-bench Machine Learning	—Unverified
Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter	Jul 29, 2024	Action RecognitionAdversarial Robustness	—Unverified
AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding	Jul 21, 2022	Action RecognitionVideo Understanding	—Unverified
AFO-TAD: Anchor-free One-Stage Detector for Temporal Action Detection	Oct 18, 2019	Action Detectionobject-detection	—Unverified
Aggregating Frame-level Features for Large-Scale Video Classification	Jul 4, 2017	ClassificationGeneral Classification	—Unverified
AirLetters: An Open Video Dataset of Characters Drawn in the Air	Oct 3, 2024	Video Understanding	—Unverified
Aligned Better, Listen Better for Audio-Visual Large Language Models	Apr 2, 2025	Video Understanding	—Unverified
ALLVB: All-in-One Long Video Understanding Benchmark	Mar 10, 2025	AllVideo Understanding	—Unverified
AMEGO: Active Memory from long EGOcentric videos	Sep 17, 2024	Video Understanding	—Unverified

Show:10 25 50

← PrevPage 13 of 23Next →

No leaderboard results yet.