SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 221230 of 1149 papers

TitleStatusHype
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across HeadsCode1
Snakes and Ladders: Two Steps Up for VideoMambaCode1
Towards Event-oriented Long Video UnderstandingCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
Slot State Space ModelsCode1
VideoVista: A Versatile Benchmark for Video Understanding and ReasoningCode1
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in VideosCode1
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric VideosCode1
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery VideosCode1
Show:102550
← PrevPage 23 of 115Next →

No leaderboard results yet.