SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 191–200 of 1149 papers

Title	Date	Tasks	Status	Hype
On the Limitations of Vision-Language Models in Understanding Image Transforms	Mar 12, 2025	Question AnsweringVideo Generation	—Unverified	0
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization	Mar 12, 2025	Temporal LocalizationVideo Understanding	—Unverified	0
FaVChat: Unlocking Fine-Grained Facail Video Understanding with Multimodal Large Language Models	Mar 12, 2025	Mixture-of-ExpertsQuestion Answering	—Unverified	0
Everything Can Be Described in Words: A Simple Unified Multi-Modal Framework with Semantic and Temporal Alignment	Mar 12, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers	Mar 12, 2025	GPUStreaming video understanding	—Unverified	0
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary	Mar 12, 2025	EgoSchemaRetrieval	CodeCode Available	4
Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding	Mar 12, 2025	Instruction FollowingVideo Understanding	—Unverified	0
Generative Frame Sampler for Long Video Understanding	Mar 12, 2025	Video Understanding	—Unverified	0
Memory-enhanced Retrieval Augmentation for Long Video Understanding	Mar 12, 2025	RAGRetrieval	—Unverified	0
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension	Mar 11, 2025	AutoMLDecoder	CodeCode Available	2

Show:10 25 50

← PrevPage 20 of 115Next →

No leaderboard results yet.