SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–475 of 1149 papers

Title	Date	Tasks	Status
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering	Nov 29, 2021	DiversityQuestion Answering	—Unverified
Interpretable Action Recognition on Hard to Classify Actions	Sep 19, 2024	Action RecognitionDepth Estimation	—Unverified
Localizing Events in Videos with Multimodal Queries	Jun 14, 2024	Natural Language QueriesVideo Understanding	—Unverified
In-the-Wild Video Question Answering	Oct 1, 2022	Evidence SelectionQuestion Answering	—Unverified
Long Activity Video Understanding using Functional Object-Oriented Network	Jul 3, 2018	ObjectVideo Understanding	—Unverified
IPAD: Industrial Process Anomaly Detection Dataset	Apr 23, 2024	Anomaly DetectionVideo Anomaly Detection	—Unverified
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation	Jul 13, 2023	Action RecognitionContrastive Learning	—Unverified
IQViC: In-context, Question Adaptive Vision Compressor for Long-term Video Understanding LMMs	Dec 13, 2024	Question AnsweringVideo Question Answering	—Unverified
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Jan 21, 2025	Instruction FollowingMathematical Reasoning	—Unverified
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition	Jan 11, 2019	Action ClassificationAction Recognition	—Unverified
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output	Jul 3, 2024	ArticlesImage Comprehension	—Unverified
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Apr 2, 2025	Action RecognitionAll	—Unverified
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning	Aug 29, 2024	Multi-Task LearningPrompt Learning	—Unverified
Integrated Object Detection and Tracking with Tracklet-Conditioned Detection	Nov 27, 2018	Objectobject-detection	—Unverified
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning	Jan 1, 2024	object-detectionObject Detection	—Unverified
Jointly Learning Energy Expenditures and Activities Using Egocentric Multimodal Signals	Jul 1, 2017	Video Understanding	—Unverified
Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection	Dec 6, 2024	GPUMulti-Object Tracking	—Unverified
Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input	Aug 28, 2024	Language ModelingLanguage Modelling	—Unverified
Instrument-tissue Interaction Detection Framework for Surgical Video Understanding	Mar 30, 2024	Video Understanding	—Unverified
KeyVideoLLM: Towards Large-scale Video Keyframe Selection	Jul 3, 2024	Data CompressionManagement	—Unverified
InstructionBench: An Instructional Video Understanding Benchmark	Apr 7, 2025	Common Sense ReasoningMultiple-choice	—Unverified
KnowIT VQA: Answering Knowledge-Based Questions about Videos	Oct 23, 2019	Question AnsweringVideo Question Answering	—Unverified
Knowledge-Based Visual Question Answering in Videos	Apr 17, 2020	Question AnsweringVideo Question Answering	—Unverified
Koala: Key frame-conditioned long video-LLM	Apr 5, 2024	Action RecognitionQuestion Answering	—Unverified
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding	Oct 31, 2021	Action RecognitionText Detection	—Unverified

Show:10 25 50

← PrevPage 19 of 46Next →

No leaderboard results yet.