SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 121130 of 1149 papers

TitleStatusHype
OmniVid: A Generative Framework for Universal Video UnderstandingCode2
Understanding Long Videos with Multimodal Language ModelsCode2
VideoAgent: Long-form Video Understanding with Large Language Model as AgentCode2
Beyond MOT: Semantic Multi-Object TrackingCode2
Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learningCode2
Multi-granularity Correspondence Learning from Long-term Noisy VideosCode2
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video UnderstandingCode2
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person PerspectivesCode2
MVBench: A Comprehensive Multi-modal Video Understanding BenchmarkCode2
PG-Video-LLaVA: Pixel Grounding Large Video-Language ModelsCode2
Show:102550
← PrevPage 13 of 115Next →

No leaderboard results yet.