Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–275 of 1149 papers

Title	Date	Tasks	Status	Hype
Compositional Video Understanding with Spatiotemporal Structure-based Transformers	Jan 1, 2024	Video Understanding	CodeCode Available	1
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation	Jun 15, 2025	ObjectSemantic Segmentation	CodeCode Available	1
Streaming Video Model	Mar 30, 2023	Action RecognitionDecoder	CodeCode Available	1
Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties	Nov 28, 2023	In-Context LearningVideo Understanding	CodeCode Available	1
Long Movie Clip Classification with State-Space Video Models	Apr 4, 2022	ClassificationDecoder	CodeCode Available	1
Lightweight Network Architecture for Real-Time Action Recognition	May 21, 2019	Action RecognitionCPU	CodeCode Available	1
Leveraging triplet loss for unsupervised action segmentation	Apr 13, 2023	Action SegmentationClustering	CodeCode Available	1
Clover: Towards A Unified Video-Language Alignment and Fusion Model	Jul 16, 2022	Language ModelingLanguage Modelling	CodeCode Available	1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval	Apr 18, 2021	RetrievalText Retrieval	CodeCode Available	1
Learning Temporally Latent Causal Processes from General Temporal Data	Sep 29, 2021	Causal DiscoveryDisentanglement	CodeCode Available	1
Learning Temporally Causal Latent Processes from General Temporal Data	Oct 11, 2021	Causal DiscoveryRepresentation Learning	CodeCode Available	1
Learning the Predictability of the Future	Jun 19, 2021	Representation LearningSelf-Supervised Action Recognition	CodeCode Available	1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization	Aug 4, 2021	Contrastive LearningRepresentation Learning	CodeCode Available	1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark	Aug 5, 2024	Dense Video CaptioningDiversity	CodeCode Available	1
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition	Feb 14, 2021	Action RecognitionTemporal Action Localization	CodeCode Available	1
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge	Sep 30, 2022	DescriptiveRepresentation Learning	CodeCode Available	1
Localizing Moments in Long Video Via Multimodal Guidance	Feb 26, 2023	Natural Language Moment RetrievalNatural Language Visual Grounding	CodeCode Available	1
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models	Jan 1, 2025	Action RecognitionAction Segmentation	CodeCode Available	1
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports Actions	May 16, 2021	Action DetectionAction Localization	CodeCode Available	1
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation	Oct 31, 2024	Action SegmentationAction Understanding	CodeCode Available	1
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment	Jan 1, 2025	audio-visual learningKnowledge Graphs	CodeCode Available	1
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs	Apr 21, 2025	Video Understanding	CodeCode Available	1
Language Repository for Long Video Understanding	Mar 21, 2024	EgoSchemaQuestion Answering	CodeCode Available	1
Is Appearance Free Action Recognition Possible?	Jul 13, 2022	Action RecognitionOptical Flow Estimation	CodeCode Available	1
A Simple LLM Framework for Long-Range Video Question-Answering	Dec 28, 2023	EgoSchemaLanguage Modelling	CodeCode Available	1

Show:10 25 50

← PrevPage 11 of 46Next →

No leaderboard results yet.