SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 91100 of 1149 papers

TitleStatusHype
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language ModelsCode2
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video RecognitionCode2
Neptune: The Long Orbit to Benchmarking Long Video UnderstandingCode2
LinVT: Empower Your Image-level Large Language Model to Understand VideosCode2
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and PruningCode2
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long VideosCode2
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization AbilityCode2
StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video UnderstandingCode2
PPLLaVA: Varied Video Sequence Understanding With Prompt GuidanceCode2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
Show:102550
← PrevPage 10 of 115Next →

No leaderboard results yet.