Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 1149 papers

Title	Date	Tasks	Status	Hype
Do Language Models Understand Time?	Dec 18, 2024	Action RecognitionAnomaly Detection	CodeCode Available	1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos	Aug 18, 2023	point cloud video understandingSelf-Supervised Learning	CodeCode Available	1
Localizing Moments in Long Video Via Multimodal Guidance	Feb 26, 2023	Natural Language Moment RetrievalNatural Language Visual Grounding	CodeCode Available	1
Long Movie Clip Classification with State-Space Video Models	Apr 4, 2022	ClassificationDecoder	CodeCode Available	1
Lightweight Network Architecture for Real-Time Action Recognition	May 21, 2019	Action RecognitionCPU	CodeCode Available	1
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge	Sep 30, 2022	DescriptiveRepresentation Learning	CodeCode Available	1
Learning the Predictability of the Future	Jun 19, 2021	Representation LearningSelf-Supervised Action Recognition	CodeCode Available	1
Procedure-Aware Pretraining for Instructional Video Understanding	Mar 31, 2023	Video Understanding	CodeCode Available	1
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection	Dec 9, 2021	Boundary DetectionDiversity	CodeCode Available	1
Learning Temporally Latent Causal Processes from General Temporal Data	Sep 29, 2021	Causal DiscoveryDisentanglement	CodeCode Available	1
Leveraging triplet loss for unsupervised action segmentation	Apr 13, 2023	Action SegmentationClustering	CodeCode Available	1
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding	Jul 8, 2025	Autonomous DrivingVideo Understanding	CodeCode Available	1
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally Actions	Apr 21, 2022	Action DetectionVideo Understanding	CodeCode Available	1
Dual-path Adaptation from Image to Video Transformers	Mar 17, 2023	Action ClassificationAction Recognition	CodeCode Available	1
Learning Optical Flow with Adaptive Graph Reasoning	Feb 8, 2022	Motion EstimationOptical Flow Estimation	CodeCode Available	1
Relational Self-Attention: What's Missing in Attention for Video Understanding	Nov 2, 2021	Action RecognitionTemporal Action Localization	CodeCode Available	1
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization	Mar 24, 2021	Action LocalizationTemporal Action Localization	CodeCode Available	1
REVECA -- Rich Encoder-decoder framework for Video Event CAptioner	Jun 18, 2022	DecoderSemantic Segmentation	CodeCode Available	1
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment	Jan 1, 2025	audio-visual learningKnowledge Graphs	CodeCode Available	1
Compositional Video Understanding with Spatiotemporal Structure-based Transformers	Jan 1, 2024	Video Understanding	CodeCode Available	1
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1
Learning Self-Similarity in Space and Time as a Generalized Motion for Action Recognition	Jan 1, 2021	Action RecognitionVideo Understanding	CodeCode Available	1
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding	Mar 27, 2025	FormLanguage Modeling	CodeCode Available	1
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs	Apr 21, 2025	Video Understanding	CodeCode Available	1

Show:10 25 50

← PrevPage 10 of 46Next →

No leaderboard results yet.