SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 391400 of 1149 papers

TitleStatusHype
Is Appearance Free Action Recognition Possible?Code1
Transformer-Based Model for Monocular Visual Odometry: A Video Understanding ApproachCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
Isolated Sign Recognition from RGB Video using Pose Flow and Self-AttentionCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
CyberV: Cybernetics for Test-time Scaling in Video UnderstandingCode1
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic ScenesCode1
Learning Self-Similarity in Space and Time as a Generalized Motion for Action RecognitionCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation ModelsCode1
Show:102550
← PrevPage 40 of 115Next →

No leaderboard results yet.