SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 251260 of 1149 papers

TitleStatusHype
HFGCN:Hypergraph Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition0
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual SegmentationCode1
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual AwarenessCode1
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks0
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video UnderstandingCode4
MECD+: Unlocking Event-Level Causal Graph Discovery for Video ReasoningCode1
Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling0
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video CaptioningCode1
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding0
From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living ActivitiesCode1
Show:102550
← PrevPage 26 of 115Next →

No leaderboard results yet.