SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 576–600 of 1149 papers

Title	Date	Tasks	Status
Videoprompter: an ensemble of foundational models for zero-shot video understanding	Oct 23, 2023	Action RecognitionDescriptive	—Unverified
Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling	Jan 13, 2025	Video Quality AssessmentVideo Understanding	—Unverified
Video RWKV:Video Action Recognition Based RWKV	Nov 8, 2024	Action RecognitionRepresentation Learning	—Unverified
VideoSAVi: Self-Aligned Video Language Models without Human Supervision	Dec 1, 2024	EgoSchemaMVBench	—Unverified
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers	Mar 12, 2025	GPUStreaming video understanding	—Unverified
Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022	Jul 22, 2022	ObjectObject State Change Classification	—Unverified
Video Time: Properties, Encoders and Evaluation	Jul 18, 2018	Video Understanding	—Unverified
Video Token Merging for Long-form Video Understanding	Oct 31, 2024	FormVideo Classification	—Unverified
Video Understanding as Machine Translation	Jun 12, 2020	Machine TranslationMetric Learning	—Unverified
Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs	Jul 2, 2024	Video Understanding	—Unverified
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding	Mar 24, 2025	8kGPU	—Unverified
VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding	Dec 4, 2024	HallucinationInstruction Following	—Unverified
VidLPRO: A Video-Language Pre-training Framework for Robotic and Laparoscopic Surgery	Sep 7, 2024	Computational EfficiencyContrastive Learning	—Unverified
ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification	Oct 13, 2024	Contrastive LearningPerson Re-Identification	—Unverified
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation	Dec 1, 2024	Instruction FollowingVideo Understanding	—Unverified
Visual Context Window Extension: A New Perspective for Long Video Understanding	Sep 30, 2024	Video Understanding	—Unverified
Visual Subtitle Feature Enhanced Video Outline Generation	Aug 24, 2022	ArticlesHeadline Generation	—Unverified
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding	May 20, 2021	Action SegmentationLanguage Modeling	—Unverified
VRDFormer: End-to-End Video Visual Relation Detection With Transformers	Jan 1, 2022	ObjectRelation	—Unverified
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning	Mar 14, 2025	BenchmarkingRelational Reasoning	—Unverified
VUDG: A Dataset for Video Understanding Domain Generalization	May 30, 2025	Domain GeneralizationMultiple-choice	—Unverified
Wasserstein Dependency Measure for Representation Learning	Mar 28, 2019	Object Recognitionreinforcement-learning	—Unverified
Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding	Mar 14, 2025	DenoisingDense Video Captioning	—Unverified
Weakly Supervised Multiclass Video Segmentation	Jun 1, 2014	SegmentationSemantic Similarity	—Unverified
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models	Jan 1, 2025	Action LocalizationTemporal Action Localization	—Unverified

Show:10 25 50

← PrevPage 24 of 46Next →

No leaderboard results yet.