SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 941–950 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
Long Activity Video Understanding using Functional Object-Oriented Network	Jul 3, 2018	ObjectVideo Understanding	—Unverified	0	0
LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models	Feb 21, 2025	Caption GenerationVideo Captioning	—Unverified	0	0
Long-Short Temporal Contrastive Learning of Video Transformers	Jun 17, 2021	Action RecognitionContrastive Learning	—Unverified	0	0
LongVILA: Scaling Long-Context Visual Language Models for Long Videos	Aug 19, 2024	Video CaptioningVideo Question Answering	—Unverified	0	0
LongViTU: Instruction Tuning for Long-Form Video Understanding	Jan 9, 2025	EgoSchemaForm	—Unverified	0	0
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory	Mar 17, 2025	FormGPU	—Unverified	0	0
Look Every Frame All at Once: Video-Ma^2mba for Efficient Long-form Video Understanding with Multi-Axis Gradient Checkpointing	Nov 29, 2024	AllForm	—Unverified	0	0
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization	Mar 28, 2021	Action ClassificationAction Localization	—Unverified	0	0
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents	Mar 13, 2025	Computational EfficiencyOptical Character Recognition (OCR)	—Unverified	0	0
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models	Feb 4, 2025	GPUVideo Understanding	—Unverified	0	0

Show:10 25 50

← PrevPage 95 of 115Next →

No leaderboard results yet.