SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 241250 of 1149 papers

TitleStatusHype
Learning Optical Flow with Adaptive Graph ReasoningCode1
Relational Self-Attention: What's Missing in Attention for Video UnderstandingCode1
Learning Salient Boundary Feature for Anchor-free Temporal Action LocalizationCode1
REVECA -- Rich Encoder-decoder framework for Video Event CAptionerCode1
Language-Guided Audio-Visual Learning for Long-Term Sports AssessmentCode1
Compositional Video Understanding with Spatiotemporal Structure-based TransformersCode1
Language Repository for Long Video UnderstandingCode1
Learning Self-Similarity in Space and Time as a Generalized Motion for Action RecognitionCode1
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingCode1
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMsCode1
Show:102550
← PrevPage 25 of 115Next →

No leaderboard results yet.