SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 141150 of 1149 papers

TitleStatusHype
Dense Connector for MLLMsCode2
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo BenchmarkCode2
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video UnderstandingCode2
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language ModelsCode2
Omni-Video: Democratizing Unified Video Understanding and GenerationCode2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-ConquerCode2
Re-thinking Temporal Search for Long-Form Video UnderstandingCode2
Boosting Single Image Super-Resolution via Partial Channel ShiftingCode1
Leveraging triplet loss for unsupervised action segmentationCode1
Show:102550
← PrevPage 15 of 115Next →

No leaderboard results yet.