SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 131140 of 1149 papers

TitleStatusHype
LongVLM: Efficient Long Video Understanding via Large Language ModelsCode2
Foundation Models for Video Understanding: A SurveyCode2
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video UnderstandingCode2
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language UnderstandingCode2
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMsCode2
Leveraging Temporal Contextualization for Video Action RecognitionCode2
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language ModelsCode2
LinVT: Empower Your Image-level Large Language Model to Understand VideosCode2
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMsCode2
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long VideosCode2
Show:102550
← PrevPage 14 of 115Next →

No leaderboard results yet.