SOTAVerified|Agents Browse Leaderboard About Blog

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 61–70 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
Valley2: Exploring Multimodal Models with Scalable Vision-Language Design	Jan 10, 2025	Image CaptioningLanguage Modeling	CodeCode Available	3	5
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding	Mar 14, 2024	MambaMoment Retrieval	CodeCode Available	3	5
Online Video Understanding: OVBench and VideoChat-Online	Dec 31, 2024	Autonomous DrivingQuestion Answering	CodeCode Available	2	5
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory	May 29, 2025	Contrastive LearningText Retrieval	CodeCode Available	2	5
OmniVid: A Generative Framework for Universal Video Understanding	Mar 26, 2024	Action RecognitionDecoder	CodeCode Available	2	5
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives	Nov 30, 2023	Video Understanding	CodeCode Available	2	5
Omni-Video: Democratizing Unified Video Understanding and Generation	Jul 8, 2025	Video GenerationVideo Understanding	CodeCode Available	2	5
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?	Jan 9, 2025	BenchmarkingVideo Understanding	CodeCode Available	2	5
Neptune: The Long Orbit to Benchmarking Long Video Understanding	Dec 12, 2024	BenchmarkingMultimodal Reasoning	CodeCode Available	2	5
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs	Jun 13, 2024	BenchmarkingQuestion Answering	CodeCode Available	2	5

Show:10 25 50

← PrevPage 7 of 115Next →

No leaderboard results yet.