SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 621630 of 1149 papers

TitleStatusHype
CAST: Cross-Attention in Space and Time for Video Action RecognitionCode1
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person PerspectivesCode2
Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional PropertiesCode1
Panoptic Video Scene Graph GenerationCode1
MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkCode2
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer LearningCode1
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation0
Mug-STAN: Adapting Image-Language Pretrained Models for General Video UnderstandingCode1
PG-Video-LLaVA: Pixel Grounding Large Video-Language ModelsCode2
Vamos: Versatile Action Models for Video UnderstandingCode0
Show:102550
← PrevPage 63 of 115Next →

No leaderboard results yet.