Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 326–350 of 1149 papers

Title	Date	Tasks	Status	Hype
End-to-end Temporal Action Detection with Transformer	Jun 18, 2021	Action DetectionTemporal Action Localization	CodeCode Available	1
Learning the Predictability of the Future	Jun 19, 2021	Representation LearningSelf-Supervised Action Recognition	CodeCode Available	1
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads	Jun 27, 2024	Diversityimage-classification	CodeCode Available	1
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning	Sep 27, 2023	Action RecognitionAction Segmentation	CodeCode Available	1
Open-Vocabulary Video Relation Extraction	Dec 25, 2023	Action ClassificationAction Understanding	CodeCode Available	1
End-to-End Referring Video Object Segmentation with Multimodal Transformers	Nov 29, 2021	Inductive BiasInstance Segmentation	CodeCode Available	1
Panoramic Vision Transformer for Saliency Detection in 360° Videos	Sep 19, 2022	Saliency DetectionSaliency Prediction	CodeCode Available	1
Learning Temporally Causal Latent Processes from General Temporal Data	Oct 11, 2021	Causal DiscoveryRepresentation Learning	CodeCode Available	1
PAVE: Patching and Adapting Video Large Language Models	Mar 25, 2025	Audio-visual Question AnsweringMulti-Task Learning	CodeCode Available	1
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge	Sep 30, 2022	DescriptiveRepresentation Learning	CodeCode Available	1
Learning Optical Flow with Adaptive Graph Reasoning	Feb 8, 2022	Motion EstimationOptical Flow Estimation	CodeCode Available	1
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Oct 24, 2024	document understandingVideo Understanding	CodeCode Available	1
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization	Mar 24, 2021	Action LocalizationTemporal Action Localization	CodeCode Available	1
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment	Jan 1, 2025	audio-visual learningKnowledge Graphs	CodeCode Available	1
An overview on the evaluated video retrieval tasks at TRECVID 2022	Jun 22, 2023	Ad-hoc video searchRetrieval	CodeCode Available	1
Procedure-Aware Pretraining for Instructional Video Understanding	Mar 31, 2023	Video Understanding	CodeCode Available	1
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection	Dec 9, 2021	Boundary DetectionDiversity	CodeCode Available	1
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1
Learning Self-Similarity in Space and Time as a Generalized Motion for Action Recognition	Jan 1, 2021	Action RecognitionVideo Understanding	CodeCode Available	1
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs	Apr 21, 2025	Video Understanding	CodeCode Available	1
A Comprehensive Study of Deep Video Action Recognition	Dec 11, 2020	Action RecognitionDeep Learning	CodeCode Available	1
Elaborative Rehearsal for Zero-shot Action Recognition	Aug 5, 2021	Action RecognitionFew-Shot Learning	CodeCode Available	1
Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions	May 19, 2022	Contrastive LearningSelf-Supervised Learning	CodeCode Available	1
FrameExit: Conditional Early Exiting for Efficient Video Recognition	Apr 27, 2021	Video RecognitionVideo Understanding	CodeCode Available	1
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation	Oct 31, 2024	Action SegmentationAction Understanding	CodeCode Available	1

Show:10 25 50

← PrevPage 14 of 46Next →

No leaderboard results yet.