SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 191200 of 1149 papers

TitleStatusHype
MECD+: Unlocking Event-Level Causal Graph Discovery for Video ReasoningCode1
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video CaptioningCode1
From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living ActivitiesCode1
Unifying Specialized Visual Encoders for Video Language ModelsCode1
Language-Guided Audio-Visual Learning for Long-Term Sports AssessmentCode1
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space ModelsCode1
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video UnderstandingCode1
Do Language Models Understand Time?Code1
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal GroundingCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
Show:102550
← PrevPage 20 of 115Next →

No leaderboard results yet.