SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 271280 of 1149 papers

TitleStatusHype
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
QuerYD: A video dataset with high-quality text and audio narrationsCode1
REVECA -- Rich Encoder-decoder framework for Video Event CAptionerCode1
BehAVE: Behaviour Alignment of Video Game EncodingsCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionCode1
Prompting Visual-Language Models for Efficient Video UnderstandingCode1
A Simple LLM Framework for Long-Range Video Question-AnsweringCode1
Show:102550
← PrevPage 28 of 115Next →

No leaderboard results yet.