SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–925 of 1149 papers

Title	Date	Tasks	Status
Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition	Aug 12, 2017	Objectobject-detection	—Unverified
KnowIT VQA: Answering Knowledge-Based Questions about Videos	Oct 23, 2019	Question AnsweringVideo Question Answering	—Unverified
Knowledge-Based Visual Question Answering in Videos	Apr 17, 2020	Question AnsweringVideo Question Answering	—Unverified
Koala: Key frame-conditioned long video-LLM	Apr 5, 2024	Action RecognitionQuestion Answering	—Unverified
Label Denoising with Large Ensembles of Heterogeneous Neural Networks	Sep 12, 2018	Data AugmentationDenoising	—Unverified
Language as the Medium: Multimodal Video Classification through text only	Sep 19, 2023	Action RecognitionVideo Classification	—Unverified
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers	Apr 2, 2021	DiagnosticVideo Editing	—Unverified
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges	Jul 2, 2025	Video Understanding	—Unverified
Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling	Sep 21, 2018	General ClassificationVideo Classification	—Unverified
Large Scale Video Representation Learning via Relational Graph Clustering	Jun 1, 2020	ClusteringGraph Clustering	—Unverified
Large-Scale YouTube-8M Video Understanding with Deep Neural Networks	Jun 14, 2017	ClassificationGeneral Classification	—Unverified
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision	Apr 15, 2023	Language ModelingLanguage Modelling	—Unverified
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection	Aug 8, 2021	Action DetectionKnowledge Distillation	—Unverified
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval	Apr 3, 2025	Information RetrievalRepresentation Learning	—Unverified
Learning Dynamic MRI Reconstruction with Convolutional Network Assisted Reconstruction Swin Transformer	Sep 19, 2023	AnatomyComputational Efficiency	—Unverified
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking	Jun 7, 2021	Graph Neural NetworkMulti-Person Pose Estimation	—Unverified
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment	Jun 8, 2023	Video Understanding	—Unverified
Learning from Multiple Sources for Video Summarisation	Jan 13, 2015	ClusteringVideo Understanding	—Unverified
Learning Higher-order Object Interactions for Keypoint-based Video Understanding	May 16, 2023	Action LocalizationAction Recognition	—Unverified
Learning Object State Changes in Videos: An Open-World Perspective	Dec 19, 2023	Video Understanding	—Unverified
Learning reusable concepts across different egocentric video understanding tasks	May 30, 2025	Video Understanding	—Unverified
Learning Space-Time Semantic Correspondences	Jun 16, 2023	Imitation LearningSemantic correspondence	—Unverified
Learning text-to-video retrieval from image captioning	Apr 26, 2024	Image CaptioningImage Retrieval	—Unverified
Learning to Focus on the Foreground for Temporal Sentence Grounding	Oct 1, 2022	SentenceTemporal Sentence Grounding	—Unverified
Learning to Visually Connect Actions and their Effects	Jan 19, 2024	Object TrackingTask Planning	—Unverified

Show:10 25 50

← PrevPage 37 of 46Next →

No leaderboard results yet.