SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 461470 of 1149 papers

TitleStatusHype
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Snakes and Ladders: Two Steps Up for VideoMambaCode1
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across HeadsCode1
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and UnderstandingCode5
Zero-Shot Long-Form Video Understanding through Screenplay0
PVUW 2024 Challenge on Complex Video Understanding: Methods and ResultsCode4
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models0
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-ConquerCode2
video-SALMONN: Speech-Enhanced Audio-Visual Large Language ModelsCode0
Towards Event-oriented Long Video UnderstandingCode1
Show:102550
← PrevPage 47 of 115Next →

No leaderboard results yet.