SOTAVerified|Agents Browse Leaderboard About Blog

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 1149 papers

Title	Date	Tasks	Status	Hype
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders	May 30, 2025	Video Understanding	—Unverified	0
SiLVR: A Simple Language-based Video Reasoning Framework	May 30, 2025	MathMME	CodeCode Available	1
Learning reusable concepts across different egocentric video understanding tasks	May 30, 2025	Video Understanding	—Unverified	0
VUDG: A Dataset for Video Understanding Domain Generalization	May 30, 2025	Domain GeneralizationMultiple-choice	—Unverified	0
VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software	May 30, 2025	Question AnsweringSpatial Reasoning	CodeCode Available	1
Time Blindness: Why Video-Language Models Can't See What Humans Can?	May 30, 2025	Temporal SequencesVideo Understanding	—Unverified	0
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding	May 29, 2025	AvgVideo Understanding	CodeCode Available	0
Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding	May 29, 2025	RAGRetrieval-augmented Generation	—Unverified	0
MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection	May 29, 2025	image-classificationImage Classification	—Unverified	0
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models	May 29, 2025	Self-Supervised LearningVideo Generation	CodeCode Available	2
PreFM: Online Audio-Visual Event Parsing via Predictive Future Modeling	May 29, 2025	Video Understanding	CodeCode Available	1
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory	May 29, 2025	Contrastive LearningText Retrieval	CodeCode Available	2
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?	May 29, 2025	Video Understanding	CodeCode Available	1
Universal Visuo-Tactile Video Understanding for Embodied Interaction	May 28, 2025	FrictionLarge Language Model	—Unverified	0
VidText: Towards Comprehensive Evaluation for Video Text Understanding	May 28, 2025	Multimodal ReasoningOptical Character Recognition (OCR)	CodeCode Available	1
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding	May 27, 2025	Reinforcement Learning (RL)Video Understanding	CodeCode Available	1
Two Causally Related Needles in a Video Haystack	May 26, 2025	Video UnderstandingVisual Grounding	—Unverified	0
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models	May 26, 2025	Video Understanding	—Unverified	0
TUNA: Comprehensive Fine-grained Temporal Understanding Evaluation on Dense Dynamic Videos	May 26, 2025	AttributeVideo Understanding	CodeCode Available	0
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs	May 25, 2025	Video Understanding	—Unverified	0
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding	May 23, 2025	FormQuestion Answering	—Unverified	0
Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles	May 22, 2025	EgoSchemaFew-Shot Learning	—Unverified	0
SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding	May 22, 2025	Action ClassificationAutomatic Speech Recognition	CodeCode Available	0
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design	May 22, 2025	CPUGPU	CodeCode Available	2
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning	May 22, 2025	Misinformationreinforcement-learning	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 46Next →

No leaderboard results yet.