SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–475 of 1149 papers

Title	Date	Tasks	Status
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles	May 22, 2025	EgoSchemaFew-Shot Learning	—Unverified
ViQAgent: Zero-Shot Video Question Answering via Agent with Open-Vocabulary Grounding Validation	May 21, 2025	Decision MakingLanguage Modeling	CodeCode Available
LiveVLM: Efficient Online Video Understanding via Streaming-Oriented KV Cache and Retrieval	May 21, 2025	Autonomous DrivingQuestion Answering	—Unverified
Clapper: Compact Learning and Video Representation in VLMs	May 21, 2025	Video Understanding	—Unverified
ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning	May 21, 2025	Pseudo LabelReinforcement Learning (RL)	—Unverified
Leveraging Foundation Models for Multimodal Graph-Based Action Recognition	May 21, 2025	Action RecognitionGraph Attention	—Unverified
Domain Adaptation of VLM for Soccer Video Understanding	May 20, 2025	Action ClassificationDomain Adaptation	—Unverified
A Challenge to Build Neuro-Symbolic Video Agents	May 20, 2025	Scene ClassificationVideo Retrieval	CodeCode Available
Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?	May 20, 2025	Video Understanding	—Unverified
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding	May 19, 2025	Language ModelingLanguage Modelling	CodeCode Available
From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations	May 18, 2025	Video EditingVideo Understanding	—Unverified
SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation	May 13, 2025	Computational EfficiencyVideo Understanding	—Unverified
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models	May 13, 2025	FormMultiple-choice	CodeCode Available
Gameplay Highlights Generation	May 12, 2025	Event DetectionHighlight Detection	—Unverified
Seed1.5-VL Technical Report	May 11, 2025	Mixture-of-ExpertsMultimodal Reasoning	—Unverified
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant	May 8, 2025	Language ModelingLanguage Modelling	—Unverified
RAVU: Retrieval Augmented Video Understanding with Compositional Reasoning over Graph	May 6, 2025	EgoSchemaRetrieval	—Unverified
VideoLLM Benchmarks and Evaluation: A Survey	May 3, 2025	SurveyVideo Understanding	—Unverified
Empowering Agentic Video Analytics Systems with Video Language Models	May 1, 2025	Knowledge GraphsRAG	—Unverified
SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding	Apr 30, 2025	Video Understanding	CodeCode Available
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Apr 24, 2025	Caption GenerationDense Video Captioning	—Unverified
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs	Apr 23, 2025	Token ReductionVideo Understanding	—Unverified
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes	Apr 21, 2025	MMEVideo MME	—Unverified
OmniV-Med: Scaling Medical Vision-Language Model for Universal Visual Understanding	Apr 20, 2025	Language ModelingLanguage Modelling	—Unverified
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection	Apr 20, 2025	Action DetectionDecoder	—Unverified

Show:10 25 50

← PrevPage 19 of 46Next →

No leaderboard results yet.