Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 1149 papers

Title	Date	Tasks	Status	Hype
Action Scene Graphs for Long-Form Understanding of Egocentric Videos	Dec 6, 2023	Action AnticipationForm	CodeCode Available	1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video Parsing	Nov 24, 2021	audio-visual event localizationVideo Understanding	CodeCode Available	1
Modeling Video As Stochastic Processes for Fine-Grained Video Representation Learning	Jan 1, 2023	Contrastive LearningRepresentation Learning	CodeCode Available	1
Crossover Learning for Fast Online Video Instance Segmentation	Apr 13, 2021	Instance SegmentationSemantic Segmentation	CodeCode Available	1
Localizing Moments in Long Video Via Multimodal Guidance	Feb 26, 2023	Natural Language Moment RetrievalNatural Language Visual Grounding	CodeCode Available	1
Long Movie Clip Classification with State-Space Video Models	Apr 4, 2022	ClassificationDecoder	CodeCode Available	1
Lightweight Network Architecture for Real-Time Action Recognition	May 21, 2019	Action RecognitionCPU	CodeCode Available	1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation	Dec 16, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1
Contrastive Masked Autoencoders for Self-Supervised Video Hashing	Nov 21, 2022	DecoderRetrieval	CodeCode Available	1
Learning the Predictability of the Future	Jun 19, 2021	Representation LearningSelf-Supervised Action Recognition	CodeCode Available	1
Learning Temporally Latent Causal Processes from General Temporal Data	Sep 29, 2021	Causal DiscoveryDisentanglement	CodeCode Available	1
DEVIAS: Learning Disentangled Video Representations of Action and Scene	Nov 30, 2023	Action RecognitionDecoder	CodeCode Available	1
Multimodal Distillation for Egocentric Action Recognition	Jul 14, 2023	Action RecognitionKnowledge Distillation	CodeCode Available	1
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos	Jun 3, 2024	Mistake DetectionOnline Mistake Detection	CodeCode Available	1
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge	Sep 30, 2022	DescriptiveRepresentation Learning	CodeCode Available	1
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition	Feb 14, 2021	Action RecognitionTemporal Action Localization	CodeCode Available	1
Dual-path Adaptation from Image to Video Transformers	Mar 17, 2023	Action ClassificationAction Recognition	CodeCode Available	1
Disentangle Your Dense Object Detector	Jul 7, 2021	DisentanglementObject	CodeCode Available	1
Learning Temporally Causal Latent Processes from General Temporal Data	Oct 11, 2021	Causal DiscoveryRepresentation Learning	CodeCode Available	1
DisTime: Distribution-based Time Representation for Video Large Language Models	May 30, 2025	Temporal LocalizationVideo Understanding	CodeCode Available	1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection	May 5, 2022	Action Detectionobject-detection	CodeCode Available	1
Object-Region Video Transformers	Oct 13, 2021	Action DetectionAction Recognition	CodeCode Available	1
Learning Optical Flow with Adaptive Graph Reasoning	Feb 8, 2022	Motion EstimationOptical Flow Estimation	CodeCode Available	1
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization	Mar 24, 2021	Action LocalizationTemporal Action Localization	CodeCode Available	1

Show:10 25 50

← PrevPage 9 of 46Next →

No leaderboard results yet.