SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 181190 of 1149 papers

TitleStatusHype
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in VideosCode1
Modeling Video As Stochastic Processes for Fine-Grained Video Representation LearningCode1
AutoVideo: An Automated Video Action Recognition SystemCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
Action Scene Graphs for Long-Form Understanding of Egocentric VideosCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
Grounded Question-Answering in Long Egocentric VideosCode1
Learning Video Context as Interleaved Multimodal SequencesCode1
Agentic Keyframe Search for Video Question AnsweringCode1
MM-VID: Advancing Video Understanding with GPT-4V(ision)Code1
Show:102550
← PrevPage 19 of 115Next →

No leaderboard results yet.