Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–325 of 1149 papers

Title	Date	Tasks	Status	Hype
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos	Aug 18, 2023	point cloud video understandingSelf-Supervised Learning	CodeCode Available	1
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding	Jul 8, 2025	Autonomous DrivingVideo Understanding	CodeCode Available	1
Towards Visually Explaining Video Understanding Networks with Perturbation	May 1, 2020	Video Understanding	CodeCode Available	1
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation	Jun 15, 2025	ObjectSemantic Segmentation	CodeCode Available	1
ETAD: Training Action Detection End to End on a Laptop	May 14, 2022	Action DetectionGPU	CodeCode Available	1
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation	Mar 25, 2025	HallucinationHallucination Evaluation	CodeCode Available	1
EPIC Fields: Marrying 3D Geometry and Video Understanding	Jun 14, 2023	3D geometryNeural Rendering	CodeCode Available	1
Long Movie Clip Classification with State-Space Video Models	Apr 4, 2022	ClassificationDecoder	CodeCode Available	1
MMAD: Multi-label Micro-Action Detection in Videos	Jul 7, 2024	Action AnalysisAction Detection	CodeCode Available	1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing	Nov 24, 2021	audio-visual event localizationVideo Understanding	CodeCode Available	1
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding	Dec 18, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis	Apr 12, 2024	Dense Video CaptioningTransfer Learning	CodeCode Available	1
Localizing Moments in Long Video Via Multimodal Guidance	Feb 26, 2023	Natural Language Moment RetrievalNatural Language Visual Grounding	CodeCode Available	1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization	Aug 4, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1
F^3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos	Apr 11, 2025	Action UnderstandingEvent Detection	CodeCode Available	1
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness	Jan 14, 2025	Event ExtractionInstruction Following	CodeCode Available	1
FineAction: A Fine-Grained Video Dataset for Temporal Action Localization	May 24, 2021	Action DetectionAction Localization	CodeCode Available	1
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector	Jun 7, 2022	Action ClassificationAction Detection	CodeCode Available	1
Multimodal Long Video Modeling Based on Temporal Dynamic Context	Apr 14, 2025	Video Understanding	CodeCode Available	1
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge	Sep 30, 2022	DescriptiveRepresentation Learning	CodeCode Available	1
Learning the Predictability of the Future	Jun 19, 2021	Representation LearningSelf-Supervised Action Recognition	CodeCode Available	1
Learning Temporally Causal Latent Processes from General Temporal Data	Oct 11, 2021	Causal DiscoveryRepresentation Learning	CodeCode Available	1
End-to-End Video Instance Segmentation with Transformers	Nov 30, 2020	Instance SegmentationSegmentation	CodeCode Available	1
Federated Self-supervised Learning for Video Understanding	Jul 5, 2022	Action RecognitionFederated Learning	CodeCode Available	1
Learning Temporally Latent Causal Processes from General Temporal Data	Sep 29, 2021	Causal DiscoveryDisentanglement	CodeCode Available	1

Show:10 25 50

← PrevPage 13 of 46Next →

No leaderboard results yet.