SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 531540 of 1149 papers

TitleStatusHype
Highlight Timestamp Detection Model for Comedy Videos via Multimodal Sentiment Analysis0
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding0
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search0
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD0
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding0
Deep Spatio-Temporal Random Fields for Efficient Video Segmentation0
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding0
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training0
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark0
Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions0
Show:102550
← PrevPage 54 of 115Next →

No leaderboard results yet.