SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 561570 of 1149 papers

TitleStatusHype
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs0
Large Scale Video Representation Learning via Relational Graph Clustering0
Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling0
DualX-VSR: Dual Axial SpatialTemporal Transformer for Real-World Video Super-Resolution without Motion Compensation0
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges0
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers0
DTVLT: A Multi-modal Diverse Text Benchmark for Visual Language Tracking Based on LLM0
Beyond still images: Temporal features and input variance resilience0
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks0
Abductive Ego-View Accident Video Understanding for Safe Driving Perception0
Show:102550
← PrevPage 57 of 115Next →

No leaderboard results yet.