SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 11211130 of 1149 papers

TitleStatusHype
video-SALMONN: Speech-Enhanced Audio-Visual Large Language ModelsCode0
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-modal Knowledge TransferCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Diagnosing Error in Temporal Action DetectorsCode0
Multi-attention Networks for Temporal Localization of Video-level LabelsCode0
MOFO: MOtion FOcused Self-Supervision for Video UnderstandingCode0
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept LocalizationCode0
Detection-Fusion for Knowledge Graph Extraction from VideosCode0
Vamos: Versatile Action Models for Video UnderstandingCode0
Are current long-term video understanding datasets long-term?Code0
Show:102550
← PrevPage 113 of 115Next →

No leaderboard results yet.