SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 2130 of 1149 papers

TitleStatusHype
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding0
EVA02-AT: Egocentric Video-Language Understanding with Spatial-Temporal Rotary Positional Embeddings and Symmetric OptimizationCode0
MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models0
AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video UnderstandingCode0
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationCode1
Self-supervised Learning of Echocardiographic Video Representations via Online Cluster DistillationCode1
VideoDeepResearch: Long Video Understanding With Agentic Tool UsingCode2
HopaDIFF: Holistic-Partial Aware Fourier Conditioned Diffusion for Referring Human Action Segmentation in Multi-Person ScenariosCode0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
Show:102550
← PrevPage 3 of 115Next →

No leaderboard results yet.