SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 576–600 of 1149 papers

Title	Date	Tasks	Status
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving	Jan 8, 2025	Autonomous DrivingMamba	—Unverified
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs	Jan 8, 2025	EgoSchemaObject Tracking	—Unverified
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models	Jan 6, 2025	BenchmarkingFeature Compression	—Unverified
HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding	Jan 3, 2025	Question AnsweringVideo Understanding	CodeCode Available
HuMoCon: Concept Discovery for Human Motion Understanding	Jan 1, 2025	Video Understanding	—Unverified
Efficient Motion-Aware Video MLLM	Jan 1, 2025	Question AnsweringVideo Question Answering	—Unverified
VEU-Bench: Towards Comprehensive Understanding of Video Editing	Jan 1, 2025	Video EditingVideo Understanding	—Unverified
Video Language Model Pretraining with Spatio-temporal Masking	Jan 1, 2025	DecoderLanguage Modeling	—Unverified
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs	Jan 1, 2025	Multiple-choiceVideo Generation	—Unverified
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Jan 1, 2025	GPUQuestion Answering	—Unverified
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding	Jan 1, 2025	Question AnsweringVideo Understanding	—Unverified
Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception	Jan 1, 2025	Autonomous DrivingGesture Recognition	—Unverified
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models	Jan 1, 2025	Action LocalizationTemporal Action Localization	—Unverified
Flexible Frame Selection for Efficient Video Reasoning	Jan 1, 2025	Language ModelingLanguage Modelling	—Unverified
OV-HHIR: Open Vocabulary Human Interaction Recognition Using Cross-modal Integration of Large Language Models	Dec 31, 2024	Activity RecognitionHuman Interaction Recognition	—Unverified
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding	Dec 31, 2024	Robot ManipulationScene Understanding	—Unverified
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval	Dec 31, 2024	RetrievalText Retrieval	—Unverified
Detection-Fusion for Knowledge Graph Extraction from Videos	Dec 30, 2024	Knowledge GraphsLanguage Modeling	CodeCode Available
MVTamperBench: Evaluating Robustness of Vision-Language Models	Dec 27, 2024	Video Understanding	—Unverified
Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries	Dec 26, 2024	Question AnsweringVideo Question Answering	—Unverified
HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data	Dec 23, 2024	Action RecognitionVideo Understanding	—Unverified
Video Domain Incremental Learning for Human Action Recognition in Home Environments	Dec 22, 2024	Action Recognitionclass-incremental learning	—Unverified
FriendsQA: A New Large-Scale Deep Video Understanding Dataset with Fine-grained Topic Categorization for Story Videos	Dec 22, 2024	Language ModellingLarge Language Model	CodeCode Available
ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries	Dec 17, 2024	Human Detectionimage-classification	—Unverified
FocusChat: Text-guided Long Video Understanding via Spatiotemporal Information Filtering	Dec 17, 2024	Language ModelingLanguage Modelling	—Unverified

Show:10 25 50

← PrevPage 24 of 46Next →

No leaderboard results yet.