SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 776–800 of 1149 papers

Title	Date	Tasks	Status
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection	Nov 27, 2018	Objectobject-detection	—Unverified
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Jul 3, 2024	ArticlesImage Comprehension	—Unverified
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Jan 21, 2025	Instruction FollowingMathematical Reasoning	—Unverified
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation	Jul 13, 2023	Action RecognitionContrastive Learning	—Unverified
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Jan 21, 2025	Object TrackingReferring Expression Segmentation	—Unverified
InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model	Feb 26, 2025	Video Quality AssessmentVideo Understanding	—Unverified
Interpretable Action Recognition on Hard to Classify Actions	Sep 19, 2024	Action RecognitionDepth Estimation	—Unverified
InterRVOS: Interaction-aware Referring Video Object Segmentation	Jun 3, 2025	8kObject	—Unverified
In-the-Wild Video Question Answering	Oct 1, 2022	Evidence SelectionQuestion Answering	—Unverified
Inverse Compositional Learning for Weakly-supervised Relation Grounding	Jan 1, 2023	RelationVideo Understanding	—Unverified
IPAD: Industrial Process Anomaly Detection Dataset	Apr 23, 2024	Anomaly DetectionVideo Anomaly Detection	—Unverified
IPFormer-VideoLLM: Enhancing Multi-modal Video Understanding for Multi-shot Scenes	Jun 26, 2025	AttributeQuestion Answering	—Unverified
IQViC: In-context, Question Adaptive Vision Compressor for Long-term Video Understanding LMMs	Dec 13, 2024	Question AnsweringVideo Question Answering	—Unverified
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Apr 2, 2025	Action RecognitionAll	—Unverified
Joint Engagement Classification using Video Augmentation Techniques for Multi-person Human-robot Interaction	Dec 28, 2022	Data AugmentationFace Swapping	—Unverified
Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals	Jul 1, 2017	Video Understanding	—Unverified
Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input	Aug 28, 2024	Language ModelingLanguage Modelling	—Unverified
KeyVideoLLM: Towards Large-scale Video Keyframe Selection	Jul 3, 2024	Data CompressionManagement	—Unverified
Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition	Aug 12, 2017	Objectobject-detection	—Unverified
KnowIT VQA: Answering Knowledge-Based Questions about Videos	Oct 23, 2019	Question AnsweringVideo Question Answering	—Unverified
Knowledge-Based Visual Question Answering in Videos	Apr 17, 2020	Question AnsweringVideo Question Answering	—Unverified
Koala: Key frame-conditioned long video-LLM	Apr 5, 2024	Action RecognitionQuestion Answering	—Unverified
Label Denoising with Large Ensembles of Heterogeneous Neural Networks	Sep 12, 2018	Data AugmentationDenoising	—Unverified
Language as the Medium: Multimodal Video Classification through text only	Sep 19, 2023	Action RecognitionVideo Classification	—Unverified
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers	Apr 2, 2021	DiagnosticVideo Editing	—Unverified

Show:10 25 50

← PrevPage 32 of 46Next →

No leaderboard results yet.