SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 861–870 of 1149 papers

Title	Date	Tasks	Status	Hype
Mimic The Raw Domain: Accelerating Action Recognition in the Compressed Domain	Nov 19, 2019	Action RecognitionVideo Recognition	—Unverified	0
M-LLM Based Video Frame Selection for Efficient Video Understanding	Feb 27, 2025	EgoSchemaLanguage Modeling	—Unverified	0
MLVTG: Mamba-Based Feature Alignment and LLM-Driven Purification for Multi-Modal Video Temporal Grounding	Jun 10, 2025	Language ModelingLanguage Modelling	—Unverified	0
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	Sep 30, 2024	Mixture-of-ExpertsOptical Character Recognition (OCR)	—Unverified	0
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding	Jun 20, 2024	FormVideo Understanding	—Unverified	0
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning	May 28, 2024	Decision MakingVideo Understanding	—Unverified	0
MM-Ego: Towards Building Egocentric Multimodal LLMs	Oct 9, 2024	Video Understanding	—Unverified	0
Moment Quantization for Video Temporal Grounding	Apr 3, 2025	QuantizationVideo Understanding	—Unverified	0
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval	Feb 18, 2025	Action RecognitionMoment Retrieval	—Unverified	0
Morph: Flexible Acceleration for 3D CNN-based Video Understanding	Oct 16, 2018	MORPHVideo Recognition	—Unverified	0

Show:10 25 50

← PrevPage 87 of 115Next →

No leaderboard results yet.