SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 8190 of 1149 papers

TitleStatusHype
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video UnderstandingCode2
AIN: The Arabic INclusive Large Multimodal ModelCode2
TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video UnderstandingCode2
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced KnowledgeCode2
MMVU: Measuring Expert-Level Multi-Discipline Video UnderstandingCode2
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?Code2
Adaptive Keyframe Sampling for Long Video UnderstandingCode2
Online Video Understanding: OVBench and VideoChat-OnlineCode2
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language ModelsCode2
PruneVid: Visual Token Pruning for Efficient Video Large Language ModelsCode2
Show:102550
← PrevPage 9 of 115Next →

No leaderboard results yet.