SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 971980 of 1149 papers

TitleStatusHype
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning0
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding0
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning0
MM-Ego: Towards Building Egocentric Multimodal LLMs0
Moment Quantization for Video Temporal Grounding0
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval0
Morph: Flexible Acceleration for 3D CNN-based Video Understanding0
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models0
Motion-Guided Masking for Spatiotemporal Representation Learning0
Motion Sensitive Contrastive Learning for Self-supervised Video Representation0
Show:102550
← PrevPage 98 of 115Next →

No leaderboard results yet.