SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–625 of 1149 papers

Title	Date	Tasks	Status
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?	Mar 20, 2025	DecoderGraph Generation	—Unverified
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets	Jun 1, 2018	Video Understanding	—Unverified
When Work Matters: Transforming Classical Network Structures to Graph CNN	Jul 7, 2018	Graph ClassificationVideo Understanding	—Unverified
WildQA: In-the-Wild Video Question Answering	Sep 14, 2022	Evidence SelectionQuestion Answering	—Unverified
Wolf: Captioning Everything with a World Summarization Framework	Jul 26, 2024	Autonomous DrivingMixture-of-Experts	—Unverified
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning	May 6, 2024	Multiple-choiceVideo Understanding	—Unverified
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs	Feb 6, 2025	Video Understanding	—Unverified
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding	Jan 12, 2025	Video Understanding	—Unverified
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset	Jan 1, 2022	ManagementSegmentation	—Unverified
YouTube-8M Video Understanding Challenge Approach and Applications	Jun 26, 2017	Ensemble LearningVideo Understanding	—Unverified
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection	Nov 1, 2023	Action DetectionClassification	—Unverified
Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Oct 18, 2024	Action LocalizationLanguage Modelling	—Unverified
Zero-Shot Action Recognition in Surveillance Videos	Oct 28, 2024	Action RecognitionVideo Understanding	—Unverified
Zero-Shot Action Recognition in Videos: A Survey	Sep 13, 2019	Action RecognitionAction Recognition In Still Images	—Unverified
Zero-Shot Long-Form Video Understanding through Screenplay	Jun 25, 2024	FormQuestion Answering	—Unverified
Zero-shot Shark Tracking and Biometrics from Aerial Imagery	Jan 10, 2025	Video Understanding	—Unverified
Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network	Jun 2, 2019	General ClassificationGraph Neural Network	—Unverified
Zero-Shot Video Question Answering with Procedural Programs	Dec 1, 2023	Code GenerationLanguage Modeling	—Unverified
1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation	Jun 8, 2024	BenchmarkingInstance Segmentation	—Unverified
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation	Aug 1, 2024	Contrastive LearningMixture-of-Experts	—Unverified
FE-Adapter: Adapting Image-based Emotion Classifiers to Videos	Aug 5, 2024	Dynamic Facial Expression RecognitionEmotion Recognition	—Unverified
An Analysis of Data Transformation Effects on Segment Anything 2	Feb 25, 2025	Semantic SegmentationVideo Object Segmentation	—Unverified
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos	Feb 28, 2025	Question AnsweringVideo Understanding	—Unverified
2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation	Jun 1, 2024	Autonomous DrivingPanoptic Segmentation	—Unverified
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark	Dec 10, 2024	Autonomous NavigationSpatial Reasoning	—Unverified

Show:10 25 50

← PrevPage 25 of 46Next →

No leaderboard results yet.