SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 1120 of 1149 papers

TitleStatusHype
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video UnderstandingCode5
ShareGPT4Video: Improving Video Understanding and Generation with Better CaptionsCode5
VideoMamba: State Space Model for Efficient Video UnderstandingCode5
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense CaptioningCode4
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language ModelsCode4
PVUW 2024 Challenge on Complex Video Understanding: Methods and ResultsCode4
InternVideo: General Video Foundation Models via Generative and Discriminative LearningCode4
Kwai Keye-VL Technical ReportCode4
Goldfish: Vision-Language Understanding of Arbitrarily Long VideosCode4
Flamingo: a Visual Language Model for Few-Shot LearningCode4
Show:102550
← PrevPage 2 of 115Next →

No leaderboard results yet.