SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 631640 of 1149 papers

TitleStatusHype
SPOT! Revisiting Video-Language Models for Event Understanding0
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video UnderstandingCode2
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab0
Beyond still images: Temporal features and input variance resilience0
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection0
MM-VID: Advancing Video Understanding with GPT-4V(ision)Code1
Videoprompter: an ensemble of foundational models for zero-shot video understanding0
Query-aware Long Video Localization and Relation Discrimination for Deep Video Understanding0
A Survey on Video Diffusion ModelsCode4
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks0
Show:102550
← PrevPage 64 of 115Next →

No leaderboard results yet.