SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 831–840 of 1149 papers

Title	Date	Tasks	Status	Hype
Localizing Events in Videos with Multimodal Queries	Jun 14, 2024	Natural Language QueriesVideo Understanding	—Unverified	0
Localizing Unseen Activities in Video via Image Query	Jun 28, 2019	Action LocalizationVideo Understanding	—Unverified	0
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding	Mar 17, 2025	AttributeMME	—Unverified	0
Long Activity Video Understanding using Functional Object-Oriented Network	Jul 3, 2018	ObjectVideo Understanding	—Unverified	0
LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models	Feb 21, 2025	Caption GenerationVideo Captioning	—Unverified	0
Long-Short Temporal Contrastive Learning of Video Transformers	Jun 17, 2021	Action RecognitionContrastive Learning	—Unverified	0
LongVILA: Scaling Long-Context Visual Language Models for Long Videos	Aug 19, 2024	Video CaptioningVideo Question Answering	—Unverified	0
LongViTU: Instruction Tuning for Long-Form Video Understanding	Jan 9, 2025	EgoSchemaForm	—Unverified	0
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory	Mar 17, 2025	FormGPU	—Unverified	0
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing	Nov 29, 2024	AllForm	—Unverified	0

Show:10 25 50

← PrevPage 84 of 115Next →

No leaderboard results yet.