Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
Do Language Models Understand Time?	Dec 18, 2024	Action RecognitionAnomaly Detection	CodeCode Available	1	5
PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos	Dec 2, 2024	Question AnsweringVideo Understanding	CodeCode Available	1	5
Large Scale Holistic Video Understanding	Apr 25, 2019	Action ClassificationAction Recognition	CodeCode Available	1	5
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?	Mar 27, 2022	Self-Supervised LearningSensitivity	CodeCode Available	1	5
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation	Dec 16, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1	5
Contrastive Masked Autoencoders for Self-Supervised Video Hashing	Nov 21, 2022	DecoderRetrieval	CodeCode Available	1	5
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts	May 20, 2025	Caption GenerationRetrieval	CodeCode Available	1	5
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model	Aug 15, 2023	DecoderObject	CodeCode Available	1	5
Revisiting spatio-temporal layouts for compositional action recognition	Nov 2, 2021	Action ClassificationAction Detection	CodeCode Available	1	5
PAN: Towards Fast Action Recognition via Learning Persistence of Appearance	Aug 8, 2020	Action RecognitionOptical Flow Estimation	CodeCode Available	1	5
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives	Feb 4, 2025	Video Understanding	CodeCode Available	1	5
HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization	Aug 12, 2024	Action LocalizationTemporal Action Localization	CodeCode Available	1	5
Event-Free Moving Object Segmentation from Moving Ego Vehicle	Apr 28, 2023	Autonomous DrivingBenchmarking	CodeCode Available	1	5
Panoramic Vision Transformer for Saliency Detection in 360° Videos	Sep 19, 2022	Saliency DetectionSaliency Prediction	CodeCode Available	1	5
Dual-path Adaptation from Image to Video Transformers	Mar 17, 2023	Action ClassificationAction Recognition	CodeCode Available	1	5
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer	Apr 29, 2023	DecoderHighlight Detection	CodeCode Available	1	5
MECD+: Unlocking Event-Level Causal Graph Discovery for Video Reasoning	Jan 13, 2025	Causal DiscoveryCausal Inference	CodeCode Available	1	5
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action Detector	Jun 7, 2022	Action ClassificationAction Detection	CodeCode Available	1	5
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning	Jun 27, 2022	Action ClassificationAction Recognition	CodeCode Available	1	5
MMAD: Multi-label Micro-Action Detection in Videos	Jul 7, 2024	Action AnalysisAction Detection	CodeCode Available	1	5
Grounded Question-Answering in Long Egocentric Videos	Dec 11, 2023	Video GroundingVideo Question Answering	CodeCode Available	1	5
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding	Jun 28, 2024	Multiple-choiceVideo Understanding	CodeCode Available	1	5
Panoptic Video Scene Graph Generation	Nov 28, 2023	Graph GenerationPanoptic Scene Graph Generation	CodeCode Available	1	5
PAVE: Patching and Adapting Video Large Language Models	Mar 25, 2025	Audio-visual Question AnsweringMulti-Task Learning	CodeCode Available	1	5
Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding	Jul 30, 2022	point cloud video understandingVideo Understanding	CodeCode Available	1	5

Show:10 25 50

← PrevPage 10 of 46Next →

No leaderboard results yet.