SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 481490 of 1149 papers

TitleStatusHype
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding0
Localizing Events in Videos with Multimodal Queries0
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMsCode2
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living0
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video UnderstandingCode3
Flash-VStream: Memory-Based Real-Time Understanding for Long Video StreamsCode3
LVBench: An Extreme Long Video Understanding BenchmarkCode2
Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models0
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in VideosCode1
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD0
Show:102550
← PrevPage 49 of 115Next →

No leaderboard results yet.