Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–350 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
EPIC Fields: Marrying 3D Geometry and Video Understanding	Jun 14, 2023	3D geometryNeural Rendering	CodeCode Available	1	5
Technical Report: Temporal Aggregate Representations	Jun 6, 2021	Action AnticipationAction Recognition	CodeCode Available	1	5
Long Movie Clip Classification with State-Space Video Models	Apr 4, 2022	ClassificationDecoder	CodeCode Available	1	5
T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs	Nov 29, 2024	Data AugmentationDiversity	CodeCode Available	1	5
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot Interaction	Aug 29, 2023	Federated Learningimage-classification	CodeCode Available	1	5
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation	Mar 25, 2025	HallucinationHallucination Evaluation	CodeCode Available	1	5
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection	Apr 14, 2024	Highlight DetectionMoment Retrieval	CodeCode Available	1	5
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation	Mar 18, 2024	Referring Video Object SegmentationSemantic Segmentation	CodeCode Available	1	5
Learning Temporally Latent Causal Processes from General Temporal Data	Sep 29, 2021	Causal DiscoveryDisentanglement	CodeCode Available	1	5
Learning the Predictability of the Future	Jun 19, 2021	Representation LearningSelf-Supervised Action Recognition	CodeCode Available	1	5
Isolated Sign Recognition from RGB Video using Pose Flow and Self-Attention	Jun 11, 2021	Action RecognitionSign Language Recognition	CodeCode Available	1	5
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis	Apr 12, 2024	Dense Video CaptioningTransfer Learning	CodeCode Available	1	5
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization	Aug 4, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1	5
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs	Apr 21, 2025	Video Understanding	CodeCode Available	1	5
F^3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos	Apr 11, 2025	Action UnderstandingEvent Detection	CodeCode Available	1	5
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness	Jan 14, 2025	Event ExtractionInstruction Following	CodeCode Available	1	5
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning	May 22, 2025	Misinformationreinforcement-learning	CodeCode Available	1	5
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization	Mar 24, 2021	Action LocalizationTemporal Action Localization	CodeCode Available	1	5
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos	Feb 25, 2025	Graph LearningMistake Detection	CodeCode Available	1	5
TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models	Oct 14, 2024	2kBenchmarking	CodeCode Available	1	5
Test of Time: Instilling Video-Language Models with a Sense of Time	Jan 5, 2023	Video-Text RetrievalVideo Understanding	CodeCode Available	1	5
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning	Dec 4, 2024	Multimodal Large Language ModelVideo Understanding	CodeCode Available	1	5
IntentVizor: Towards Generic Query Guided Interactive Video Summarization	Sep 30, 2021	Video SummarizationVideo Understanding	CodeCode Available	1	5
-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation	Jan 31, 2025	Question AnsweringVideo Question Answering	CodeCode Available	1	5
End-to-End Video Instance Segmentation with Transformers	Nov 30, 2020	Instance SegmentationSegmentation	CodeCode Available	1	5
Streaming Video Temporal Action Segmentation In Real Time	Sep 28, 2022	Action SegmentationLanguage Modelling	CodeCode Available	1	5
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding	Jul 11, 2024	EEGLanguage Modeling	CodeCode Available	1	5
End-to-end Temporal Action Detection with Transformer	Jun 18, 2021	Action DetectionTemporal Action Localization	CodeCode Available	1	5
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding	Jun 28, 2024	Multiple-choiceVideo Understanding	CodeCode Available	1	5
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning	Sep 27, 2023	Action RecognitionAction Segmentation	CodeCode Available	1	5
FineAction: A Fine-Grained Video Dataset for Temporal Action Localization	May 24, 2021	Action DetectionAction Localization	CodeCode Available	1	5
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models	Jan 1, 2025	Action RecognitionAction Segmentation	CodeCode Available	1	5
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding	Jul 8, 2025	Autonomous DrivingVideo Understanding	CodeCode Available	1	5
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud Videos	Aug 18, 2023	point cloud video understandingSelf-Supervised Learning	CodeCode Available	1	5
End-to-End Referring Video Object Segmentation with Multimodal Transformers	Nov 29, 2021	Inductive BiasInstance Segmentation	CodeCode Available	1	5
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Mar 20, 2025	Multiple-choiceVideo Understanding	CodeCode Available	1	5
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges	Nov 17, 2022	Future Hand PredictionMoment Queries	CodeCode Available	1	5
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation	Dec 12, 2023	Anomaly DetectionAutonomous Driving	CodeCode Available	1	5
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Oct 24, 2024	document understandingVideo Understanding	CodeCode Available	1	5
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding	Mar 20, 2025	Video UnderstandingZero-shot Generalization	CodeCode Available	1	5
MMAD: Multi-label Micro-Action Detection in Videos	Jul 7, 2024	Action AnalysisAction Detection	CodeCode Available	1	5
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?	Mar 27, 2022	Self-Supervised LearningSensitivity	CodeCode Available	1	5
Compositional Video Understanding with Spatiotemporal Structure-based Transformers	Jan 1, 2024	Video Understanding	CodeCode Available	1	5
An overview on the evaluated video retrieval tasks at TRECVID 2022	Jun 22, 2023	Ad-hoc video searchRetrieval	CodeCode Available	1	5
Stochastic Image-to-Video Synthesis using cINNs	May 10, 2021	DiversityVideo Understanding	CodeCode Available	1	5
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives	Feb 4, 2025	Video Understanding	CodeCode Available	1	5
Large Scale Holistic Video Understanding	Apr 25, 2019	Action ClassificationAction Recognition	CodeCode Available	1	5
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning	Mar 2, 2025	Large Language ModelMulti-Instance Retrieval	CodeCode Available	1	5
FrameExit: Conditional Early Exiting for Efficient Video Recognition	Apr 27, 2021	Video RecognitionVideo Understanding	CodeCode Available	1	5
A Comprehensive Study of Deep Video Action Recognition	Dec 11, 2020	Action RecognitionDeep Learning	CodeCode Available	1	5

Show:10 25 50

← PrevPage 7 of 23Next →

No leaderboard results yet.