SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 176200 of 1149 papers

TitleStatusHype
MultiSports: A Multi-Person Video Dataset of Spatio-Temporally Localized Sports ActionsCode1
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
Multimodal Distillation for Egocentric Action RecognitionCode1
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual ActionsCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
Multimodal Long Video Modeling Based on Temporal Dynamic ContextCode1
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment GroundingCode1
AutoVideo: An Automated Video Action Recognition SystemCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Action Scene Graphs for Long-Form Understanding of Egocentric VideosCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
Mug-STAN: Adapting Image-Language Pretrained Models for General Video UnderstandingCode1
MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity ParsingCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
MotionSqueeze: Neural Motion Feature Learning for Video UnderstandingCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
Large Scale Holistic Video UnderstandingCode1
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
Agentic Keyframe Search for Video Question AnsweringCode1
Learning Video Context as Interleaved Multimodal SequencesCode1
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action RecognitionCode1
CyberV: Cybernetics for Test-time Scaling in Video UnderstandingCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
Crossover Learning for Fast Online Video Instance SegmentationCode1
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-AnsweringCode1
Show:102550
← PrevPage 8 of 46Next →

No leaderboard results yet.