Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 526–550 of 1149 papers

Title	Date	Tasks	Status	Score
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision	Jun 6, 2025	Video Understanding	CodeCode Available	5
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available	5
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding	Jun 3, 2025	Video Understanding	CodeCode Available	5
MINOTAUR: Multi-task Video Grounding From Multimodal Queries	Feb 16, 2023	Action DetectionSentence	CodeCode Available	5
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022	Nov 18, 2022	Object State Change ClassificationTemporal Localization	CodeCode Available	5
A Challenge to Build Neuro-Symbolic Video Agents	May 20, 2025	Scene ClassificationVideo Retrieval	CodeCode Available	5
Representation Flow for Action Recognition	Oct 2, 2018	Action ClassificationAction Recognition	CodeCode Available	5
Learning to Visually Connect Actions and their Effects	Jan 19, 2024	Object TrackingTask Planning	—Unverified	0
Learning to Focus on the Foreground for Temporal Sentence Grounding	Oct 1, 2022	SentenceTemporal Sentence Grounding	—Unverified	0
Efficient Annotation and Learning for 3D Hand Pose Estimation: A Survey	Jun 5, 2022	3D Hand Pose EstimationDomain Adaptation	—Unverified	0
Learning text-to-video retrieval from image captioning	Apr 26, 2024	Image CaptioningImage Retrieval	—Unverified	0
Learning Space-Time Semantic Correspondences	Jun 16, 2023	Imitation LearningSemantic correspondence	—Unverified	0
An Effective Way to Improve YouTube-8M Classification Accuracy in Google Cloud Platform	Jun 26, 2017	ClassificationDeep Learning	—Unverified	0
Learning reusable concepts across different egocentric video understanding tasks	May 30, 2025	Video Understanding	—Unverified	0
EAGLE: Egocentric AGgregated Language-video Engine	Sep 26, 2024	Action RecognitionActivity Recognition	—Unverified	0
Learning Object State Changes in Videos: An Open-World Perspective	Dec 19, 2023	Video Understanding	—Unverified	0
Learning Higher-order Object Interactions for Keypoint-based Video Understanding	May 16, 2023	Action LocalizationAction Recognition	—Unverified	0
Learning from Multiple Sources for Video Summarisation	Jan 13, 2015	ClusteringVideo Understanding	—Unverified	0
DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding	Jun 4, 2025	MMEVideo MME	—Unverified	0
BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes	Apr 4, 2024	ObjectVideo Understanding	—Unverified	0
An Attempt towards Interpretable Audio-Visual Video Captioning	Dec 7, 2018	Audio captioningAudio-Visual Video Captioning	—Unverified	0
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Nov 19, 2024	GPUQuestion Answering	—Unverified	0
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment	Jun 8, 2023	Video Understanding	—Unverified	0
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking	Jun 7, 2021	Graph Neural NetworkMulti-Person Pose Estimation	—Unverified	0
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding	Nov 19, 2024	Question AnsweringVideo Understanding	—Unverified	0

Show:10 25 50

← PrevPage 22 of 46Next →

No leaderboard results yet.