SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–575 of 1149 papers

Title	Date	Tasks	Status
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval	Feb 18, 2025	Action RecognitionMoment Retrieval	—Unverified
iMOVE: Instance-Motion-Aware Video Understanding	Feb 17, 2025	Computational EfficiencyVideo Understanding	—Unverified
Semantics-aware Test-time Adaptation for 3D Human Pose Estimation	Feb 15, 2025	3D human pose and shape estimation3D Human Pose Estimation	—Unverified
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering	Feb 13, 2025	ClassificationPrompt Engineering	—Unverified
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis	Feb 11, 2025	Action RecognitionVideo Description	—Unverified
A Survey on Mamba Architecture for Vision Applications	Feb 11, 2025	Mambaobject-detection	—Unverified
A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems	Feb 10, 2025	Autonomous DrivingEdge-computing	—Unverified
CoS: Chain-of-Shot Prompting for Long Video Understanding	Feb 10, 2025	Video Understanding	—Unverified
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs	Feb 6, 2025	Video Understanding	—Unverified
MaxInfo: A Training-Free Key-Frame Selection Method Using Maximum Volume for Enhanced Video Understanding	Feb 5, 2025	DiversityEgoSchema	—Unverified
A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions	Feb 5, 2025	Action Quality AssessmentSurvey	—Unverified
LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models	Feb 4, 2025	GPUVideo Understanding	—Unverified
Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	Jan 28, 2025	DecoderVideo Understanding	—Unverified
Understanding Long Videos via LLM-Powered Entity Relation Graphs	Jan 27, 2025	EgoSchemaLarge Language Model	—Unverified
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding	Jan 25, 2025	Action UnderstandingEmotion Recognition	—Unverified
Temporal Preference Optimization for Long-Form Video Understanding	Jan 23, 2025	FormMME	—Unverified
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	Jan 21, 2025	Instruction FollowingMathematical Reasoning	—Unverified
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	Jan 21, 2025	Object TrackingReferring Expression Segmentation	—Unverified
HFGCN:Hypergraph Fusion Graph Convolutional Networks for Skeleton-Based Action Recognition	Jan 19, 2025	Action RecognitionRelation Classification	—Unverified
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks	Jan 14, 2025	Language ModelingLanguage Modelling	—Unverified
Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling	Jan 13, 2025	Video Quality AssessmentVideo Understanding	—Unverified
X-LeBench: A Benchmark for Extremely Long Egocentric Video Understanding	Jan 12, 2025	Video Understanding	—Unverified
Zero-shot Shark Tracking and Biometrics from Aerial Imagery	Jan 10, 2025	Video Understanding	—Unverified
LongViTU: Instruction Tuning for Long-Form Video Understanding	Jan 9, 2025	EgoSchemaForm	—Unverified
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding	Jan 9, 2025	Language ModelingLanguage Modelling	—Unverified

Show:10 25 50

← PrevPage 23 of 46Next →

No leaderboard results yet.