SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 111120 of 1149 papers

TitleStatusHype
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-ConquerCode2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningCode2
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video ReasoningCode2
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMsCode2
Online Video Understanding: OVBench and VideoChat-OnlineCode2
PyTorchVideo: A Deep Learning Library for Video UnderstandingCode2
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormerCode2
Beyond MOT: Semantic Multi-Object TrackingCode2
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video UnderstandingCode2
A Content-Driven Micro-Video Recommendation Dataset at ScaleCode2
Show:102550
← PrevPage 12 of 115Next →

No leaderboard results yet.