SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 201210 of 1149 papers

TitleStatusHype
Action Scene Graphs for Long-Form Understanding of Egocentric VideosCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
ST-Adapter: Parameter-Efficient Image-to-Video Transfer LearningCode1
DeepSportradar-v1: Computer Vision Dataset for Sports Understanding with High Quality AnnotationsCode1
Crossover Learning for Fast Online Video Instance SegmentationCode1
Grounded Question-Answering in Long Egocentric VideosCode1
Mug-STAN: Adapting Image-Language Pretrained Models for General Video UnderstandingCode1
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-AnsweringCode1
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video UnderstandingCode1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
Show:102550
← PrevPage 21 of 115Next →

No leaderboard results yet.