SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 261270 of 1149 papers

TitleStatusHype
Compositional Video Understanding with Spatiotemporal Structure-based TransformersCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionCode1
Large Scale Holistic Video UnderstandingCode1
Procedure-Aware Pretraining for Instructional Video UnderstandingCode1
Prompting Visual-Language Models for Efficient Video UnderstandingCode1
QuerYD: A video dataset with high-quality text and audio narrationsCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
Point Primitive Transformer for Long-Term 4D Point Cloud Video UnderstandingCode1
Clover: Towards A Unified Video-Language Alignment and Fusion ModelCode1
Show:102550
← PrevPage 27 of 115Next →

No leaderboard results yet.