Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1025 of 1149 papers

Title	Date	Tasks	Status
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?	Mar 20, 2025	DecoderGraph Generation	—Unverified
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets	Jun 1, 2018	Video Understanding	—Unverified
When Work Matters: Transforming Classical Network Structures to Graph CNN	Jul 7, 2018	Graph ClassificationVideo Understanding	—Unverified
WildQA: In-the-Wild Video Question Answering	Sep 14, 2022	Evidence SelectionQuestion Answering	—Unverified
Wolf: Captioning Everything with a World Summarization Framework	Jul 26, 2024	Autonomous DrivingMixture-of-Experts	—Unverified
WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning	May 6, 2024	Multiple-choiceVideo Understanding	—Unverified
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs	Feb 6, 2025	Video Understanding	—Unverified
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding	Jan 12, 2025	Video Understanding	—Unverified
YouMVOS: An Actor-Centric Multi-Shot Video Object Segmentation Dataset	Jan 1, 2022	ManagementSegmentation	—Unverified
YouTube-8M Video Understanding Challenge Approach and Applications	Jun 26, 2017	Ensemble LearningVideo Understanding	—Unverified
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection	Nov 1, 2023	Action DetectionClassification	—Unverified
Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Oct 18, 2024	Action LocalizationLanguage Modelling	—Unverified
Zero-Shot Action Recognition in Surveillance Videos	Oct 28, 2024	Action RecognitionVideo Understanding	—Unverified
Zero-Shot Action Recognition in Videos: A Survey	Sep 13, 2019	Action RecognitionAction Recognition In Still Images	—Unverified
Zero-Shot Long-Form Video Understanding through Screenplay	Jun 25, 2024	FormQuestion Answering	—Unverified
Zero-shot Shark Tracking and Biometrics from Aerial Imagery	Jan 10, 2025	Video Understanding	—Unverified
Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network	Jun 2, 2019	General ClassificationGraph Neural Network	—Unverified
4D Generic Video Object Proposals	Jan 26, 2019	Instance SegmentationObject	CodeCode Available
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models	Aug 26, 2024	Large Language ModelVideo Quality Assessment	CodeCode Available
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available
A Context-Aware Loss Function for Action Spotting in Soccer Videos	Dec 3, 2019	Action SpottingVideo Understanding	CodeCode Available
Learnable pooling with Context Gating for video classification	Jun 21, 2017	ClassificationClustering	CodeCode Available
Learnable Pooling Methods for Video Classification	Oct 1, 2018	ClassificationGeneral Classification	CodeCode Available
Leaping Into Memories: Space-Time Deep Feature Synthesis	Mar 17, 2023	DiversityVideo Understanding	CodeCode Available
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing	Mar 13, 2025	EgoSchemaForm	CodeCode Available

Show:10 25 50

← PrevPage 41 of 46Next →

No leaderboard results yet.