SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 151160 of 1149 papers

TitleStatusHype
Grounded Question-Answering in Long Egocentric VideosCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video UnderstandingCode1
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally ActionsCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Actor-Context-Actor Relation Network for Spatio-Temporal Action LocalizationCode1
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-AnsweringCode1
A Multigrid Method for Efficiently Training Video ModelsCode1
From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living ActivitiesCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
Show:102550
← PrevPage 16 of 115Next →

No leaderboard results yet.