SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 351375 of 1149 papers

TitleStatusHype
Elaborative Rehearsal for Zero-shot Action RecognitionCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action SegmentationCode1
Temporal Context Aggregation Network for Temporal Action Proposal RefinementCode1
TSM: Temporal Shift Module for Efficient Video UnderstandingCode1
TCLR: Temporal Contrastive Learning for Video RepresentationCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living ActivitiesCode1
Teaching VLMs to Localize Specific Objects from In-context ExamplesCode1
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video UnderstandingCode1
EgoTaskQA: Understanding Human Tasks in Egocentric VideosCode1
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery VideosCode1
Technical Report: Temporal Aggregate RepresentationsCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
Large Scale Holistic Video UnderstandingCode1
Can An Image Classifier Suffice For Action Recognition?Code1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video RetrievalCode1
Procedure-Aware Pretraining for Instructional Video UnderstandingCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
An Empirical Study of End-to-End Temporal Action DetectionCode1
ST-Adapter: Parameter-Efficient Image-to-Video Transfer LearningCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Show:102550
← PrevPage 15 of 46Next →

No leaderboard results yet.