SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 5160 of 1149 papers

TitleStatusHype
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders0
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
DisTime: Distribution-based Time Representation for Video Large Language ModelsCode1
VUDG: A Dataset for Video Understanding Domain Generalization0
VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD SoftwareCode1
Time Blindness: Why Video-Language Models Can't See What Humans Can?0
Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding0
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection0
ScaleLong: A Multi-Timescale Benchmark for Long Video UnderstandingCode0
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation ModelsCode2
Show:102550
← PrevPage 6 of 115Next →

No leaderboard results yet.