SOTAVerified|Agents Browse Leaderboard About Blog

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 1149 papers

Title	Date	Tasks	Status	Hype
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding	Jul 17, 2025	Video GroundingVideo Understanding	—Unverified	0
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks	Jul 15, 2025	Video CaptioningVideo Understanding	CodeCode Available	1
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments	Jul 14, 2025	Scene UnderstandingSpatial Reasoning	—Unverified	0
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI	Jul 14, 2025	Large Language ModelMultimodal Large Language Model	—Unverified	0
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding	Jul 8, 2025	Autonomous DrivingVideo Understanding	CodeCode Available	1
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models	Jul 8, 2025	Future predictionLarge Language Model	—Unverified	0
Omni-Video: Democratizing Unified Video Understanding and Generation	Jul 8, 2025	Video GenerationVideo Understanding	CodeCode Available	2
Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation	Jul 8, 2025	Depth EstimationDepth Prediction	—Unverified	0
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges	Jul 2, 2025	Video Understanding	—Unverified	0
Kwai Keye-VL Technical Report	Jul 2, 2025	Instruction FollowingReinforcement Learning (RL)	CodeCode Available	4

Show:10 25 50

← PrevPage 1 of 115Next →

No leaderboard results yet.