SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 141–150 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
Dense Connector for MLLMs	May 22, 2024	Video Understanding	CodeCode Available	2	5
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark	May 30, 2024	DeepFake DetectionMamba	CodeCode Available	2	5
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	Nov 14, 2023	Image-based Generative Performance BenchmarkingLanguage Modeling	CodeCode Available	2	5
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models	Dec 18, 2024	Reasoning SegmentationSegmentation	CodeCode Available	2	5
Omni-Video: Democratizing Unified Video Understanding and Generation	Jul 8, 2025	Video GenerationVideo Understanding	CodeCode Available	2	5
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory	May 29, 2025	Contrastive LearningText Retrieval	CodeCode Available	2	5
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer	Jun 24, 2024	AI AgentLarge Language Model	CodeCode Available	2	5
Re-thinking Temporal Search for Long-Form Video Understanding	Apr 3, 2025	Computational EfficiencyForm	CodeCode Available	2	5
Boosting Single Image Super-Resolution via Partial Channel Shifting	Jan 1, 2023	DiversityImage Super-Resolution	CodeCode Available	1	5
Leveraging triplet loss for unsupervised action segmentation	Apr 13, 2023	Action SegmentationClustering	CodeCode Available	1	5

Show:10 25 50

← PrevPage 15 of 115Next →

No leaderboard results yet.