SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 110 of 1149 papers

TitleStatusHype
VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding0
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New BenchmarksCode1
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments0
Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI0
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video UnderstandingCode1
Video Event Reasoning and Prediction by Fusing World Knowledge from LLMs with Vision Foundation Models0
Omni-Video: Democratizing Unified Video Understanding and GenerationCode2
Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation0
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges0
Kwai Keye-VL Technical ReportCode4
Show:102550
← PrevPage 1 of 115Next →

No leaderboard results yet.