SOTAVerified|Agents Browse Leaderboard About Blog

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 1149 papers

Title	Date	Tasks	Status	Hype
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection	Feb 27, 2025	Action DetectionBenchmarking	CodeCode Available	3
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray	Feb 7, 2025	4kGeneral Knowledge	CodeCode Available	3
VideoRoPE: What Makes for Good Video Rotary Position Embedding?	Feb 7, 2025	HallucinationPosition	CodeCode Available	3
Valley2: Exploring Multimodal Models with Scalable Vision-Language Design	Jan 10, 2025	Image CaptioningLanguage Modeling	CodeCode Available	3
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM	Dec 31, 2024	ObjectVideo Understanding	CodeCode Available	3
VisionZip: Longer is Better but Not Necessary in Vision Language Models	Dec 5, 2024	Video UnderstandingVisual Question Answering	CodeCode Available	3
Towards Universal Soccer Video Understanding	Dec 2, 2024	Action ClassificationSports Understanding	CodeCode Available	3
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	Nov 20, 2024	GPUMME	CodeCode Available	3
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	Oct 22, 2024	Token ReductionVideo Question Answering	CodeCode Available	3
SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference	Oct 6, 2024	Language ModelingLanguage Modelling	CodeCode Available	3

Show:10 25 50

← PrevPage 5 of 115Next →

No leaderboard results yet.