SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 321330 of 1149 papers

TitleStatusHype
A Dataset for Medical Instructional Video Classification and Question AnsweringCode1
Video Joint Modelling Based on Hierarchical Transformer for Co-summarizationCode1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionCode1
Prompting Visual-Language Models for Efficient Video UnderstandingCode1
TokenLearner: Adaptive Space-Time Tokenization for VideosCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token ModelingCode1
Show:102550
← PrevPage 33 of 115Next →

No leaderboard results yet.