SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 626–650 of 1149 papers

Title	Date	Tasks	Status
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context	Nov 25, 2024	Large Language ModelMME	—Unverified
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions	Nov 24, 2024	Action ClassificationAction Recognition	CodeCode Available
ReWind: Understanding Long Videos with Instructed Learnable Memory	Nov 23, 2024	Large Language ModelQuestion Answering	—Unverified
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding	Nov 21, 2024	Computational EfficiencyVideo Understanding	—Unverified
Extending Video Masked Autoencoders to 128 frames	Nov 20, 2024	DecoderVideo Understanding	—Unverified
Principles of Visual Tokens for Efficient Video Understanding	Nov 20, 2024	Video Understanding	—Unverified
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation	Nov 20, 2024	ChatbotMultiple-choice	—Unverified
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding	Nov 19, 2024	Question AnsweringVideo Understanding	—Unverified
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction	Nov 19, 2024	GPUQuestion Answering	—Unverified
ViBe: A Text-to-Video Benchmark for Evaluating Hallucination in Large Multimodal Models	Nov 16, 2024	HallucinationVideo Generation	—Unverified
Can MLLMs Guide Weakly-Supervised Temporal Action Localization Tasks?	Nov 13, 2024	Action LocalizationTemporal Action Localization	—Unverified
EVQAScore: Efficient Video Question Answering Data Evaluation	Nov 11, 2024	Keyword ExtractionQuestion Answering	—Unverified
Video RWKV:Video Action Recognition Based RWKV	Nov 8, 2024	Action RecognitionRepresentation Learning	—Unverified
Personalized Video Summarization by Multimodal Video Understanding	Nov 5, 2024	Unsupervised Video SummarizationVideo Summarization	—Unverified
Video Token Merging for Long-form Video Understanding	Oct 31, 2024	FormVideo Classification	—Unverified
Situational Scene Graph for Structured Human-centric Situation Understanding	Oct 30, 2024	Graph GenerationPredicate Classification	CodeCode Available
Zero-Shot Action Recognition in Surveillance Videos	Oct 28, 2024	Action RecognitionVideo Understanding	—Unverified
Egocentric and Exocentric Methods: A Short Survey	Oct 27, 2024	Action RecognitionSurvey	—Unverified
Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning	Oct 26, 2024	Video Understanding	—Unverified
EVA: An Embodied World Model for Future Video Anticipation	Oct 20, 2024	Language ModelingLanguage Modelling	—Unverified
ContextDet: Temporal Action Detection with Adaptive Context Aggregation	Oct 20, 2024	Action DetectionVideo Understanding	—Unverified
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning	Oct 20, 2024	DiagnosticVideo Captioning	—Unverified
Making Every Frame Matter: Continuous Video Understanding for Large Models via Adaptive State Modeling	Oct 19, 2024	Video Understanding	—Unverified
Zero-shot Action Localization via the Confidence of Large Vision-Language Models	Oct 18, 2024	Action LocalizationLanguage Modelling	—Unverified
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models	Oct 15, 2024	Video Understanding	—Unverified

Show:10 25 50

← PrevPage 26 of 46Next →

No leaderboard results yet.